From steven at housecafemusic.com Tue May 1 11:27:55 2007 From: steven at housecafemusic.com (Steven Garcia) Date: Tue, 1 May 2007 17:27:55 +0200 Subject: [Ferret-talk] Multiple Model Search In-Reply-To: <1F0AC985-EEEC-4FE6-8E06-40E5EB73A0AD@housecafemusic.com> References: <20070417132408.GK5558@cordoba.webit.de> <20070418102735.GM5558@cordoba.webit.de> <27CCBB0B-8FD0-412E-AF84-A481512BF21B@housecafemusic.com> <20070418141625.GO5558@cordoba.webit.de> <20070418152907.GR5558@cordoba.webit.de> <53F0764D-453B-4FE4-870C-54AB623E1258@housecafemusic.com> <20070418224044.GB22049@cordoba.webit.de> <20070419073947.GS5558@cordoba.webit.de> <1F0AC985-EEEC-4FE6-8E06-40E5EB73A0AD@housecafemusic.com> Message-ID: <21BBECDB-DEC8-4991-9E04-FA8EA1FD5E22@housecafemusic.com> I still have not resolved this multi_search thing. I get a nil error for the constantize method. I have posted the complete code and full stack trace here: http://pastie.caboo.se/57957 Any help would be hugely appreciated From bensaccount at yahoo.com Tue May 1 12:45:49 2007 From: bensaccount at yahoo.com (Ben) Date: Tue, 1 May 2007 18:45:49 +0200 Subject: [Ferret-talk] AAF and DRb server Message-ID: I've installed the ferret gem. I installed AAF as a plugin into my vendor/plugins directory of my project. In development environment my searches work just fine, without any problems. For production I configured the ferret_server.yml file with correct information about the machine my DRb server is running on. I start the ferret server with the following command: ruby script/runner 'load "script/ferret_start"' The DRb server seems to be starting fine. I get: Starting ferret DRb server...Done. But when I run a top, I see that my ruby process that I started ferret_server in is sitting at 95 to 99% at all times. This happens with no load on the server. Basically it seems that the DRb server is in some sort of endless loop. Any search request just blocks and doesn't come back. But as soon as I kill the DRb ruby process, I get a connection error, which tell me that the search was waiting for the server to finish doing whatever it's doing. Even when I run the stop script, I get: Stopping ferret_server... Sending TERM to ferret_server with PID #...Done. But the process that started the DRb server doesn't quit and still sits at 90 percentile range. I tried to look at the ferret log files (ferret_server.log and ferret_server.out and ferret_index.log), but there is nothing in there. Just the creation date. I checked my environment.rb and I don't have anything that recursively loads my modules. I let the server run overnight, thinking that it is indexing my DB but it still didn't come back. With local index files and not DRb server, it takes ferret, at most, 15 minutes to index my DB. I can't think of anything else to check or make sure of. Does anyone have any idea what's going on? Ruby 1.8.5 Rails 1.1.6 Ferret 0.11.4 Latest version of AAF as of this writing. Thanks alot. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue May 1 13:07:20 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 1 May 2007 19:07:20 +0200 Subject: [Ferret-talk] Multiple Model Search In-Reply-To: <21BBECDB-DEC8-4991-9E04-FA8EA1FD5E22@housecafemusic.com> References: <27CCBB0B-8FD0-412E-AF84-A481512BF21B@housecafemusic.com> <20070418141625.GO5558@cordoba.webit.de> <20070418152907.GR5558@cordoba.webit.de> <53F0764D-453B-4FE4-870C-54AB623E1258@housecafemusic.com> <20070418224044.GB22049@cordoba.webit.de> <20070419073947.GS5558@cordoba.webit.de> <1F0AC985-EEEC-4FE6-8E06-40E5EB73A0AD@housecafemusic.com> <21BBECDB-DEC8-4991-9E04-FA8EA1FD5E22@housecafemusic.com> Message-ID: <20070501170720.GA25006@cordoba.webit.de> On Tue, May 01, 2007 at 05:27:55PM +0200, Steven Garcia wrote: > I still have not resolved this multi_search thing. I get a nil error > for the constantize method. I have posted the complete code and full > stack trace here: > > http://pastie.caboo.se/57957 I'm pretty sure this is because you have an index where the class names are missing. In one of your last mails you mentioned that you solved this by rebuilding? Regarding your other problem (nil.latest? in multi_search), I changed something in the error handling in svn trunk so the problem should now be more obvious when this happens. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Tue May 1 13:20:25 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 1 May 2007 19:20:25 +0200 Subject: [Ferret-talk] AAF and DRb server In-Reply-To: References: Message-ID: <20070501172025.GB25006@cordoba.webit.de> On Tue, May 01, 2007 at 06:45:49PM +0200, Ben wrote: > I've installed the ferret gem. I installed AAF as a plugin into my > vendor/plugins directory of my project. In development environment my > searches work just fine, without any problems. > > For production I configured the ferret_server.yml file with correct > information about the machine my DRb server is running on. I start the > ferret server with the following command: > > ruby script/runner 'load "script/ferret_start"' > > The DRb server seems to be starting fine. I get: > > Starting ferret DRb server...Done. > > But when I run a top, I see that my ruby process that I started > ferret_server in is sitting at 95 to 99% at all times. This happens with > no load on the server. Basically it seems that the DRb server is in some > sort of endless loop. Any search request just blocks and doesn't come > back. But as soon as I kill the DRb ruby process, I get a connection > error, which tell me that the search was waiting for the server to > finish doing whatever it's doing. Recently a similar problem came up because AR models were explicitly required in environment.rb. Could you please try AAF trunk and start your server with FERRET_USE_LOCAL_INDEX=1 ruby script/runner 'load "script/ferret_start"' This explicitly tells the server that it *is* the server and therefore should use the local index. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From bensaccount at yahoo.com Tue May 1 13:45:15 2007 From: bensaccount at yahoo.com (Ben) Date: Tue, 1 May 2007 19:45:15 +0200 Subject: [Ferret-talk] AAF and DRb server In-Reply-To: <20070501172025.GB25006@cordoba.webit.de> References: <20070501172025.GB25006@cordoba.webit.de> Message-ID: <0157008bff215c1f5fec810aea956aa7@ruby-forum.com> Jens Kraemer wrote: > Recently a similar problem came up because AR models were explicitly > required in environment.rb. Could you please try AAF trunk and start > your server with > > FERRET_USE_LOCAL_INDEX=1 ruby script/runner 'load "script/ferret_start"' > > This explicitly tells the server that it *is* the server and therefore > should use the local index. That does it! It works now. I checked my environment.rb and I couldn't find any statements that required any of my models. I don't know what was happening before. I guess the trunk code has some fixes and tweaks, being the latest development snapshot. Thanks for your quick response and great job on the AAF plugin. This is how good written code should be, simple and functional. Thanks, Ben -- Posted via http://www.ruby-forum.com/. From steven at housecafemusic.com Tue May 1 15:09:40 2007 From: steven at housecafemusic.com (Steven Garcia) Date: Tue, 1 May 2007 21:09:40 +0200 Subject: [Ferret-talk] Multiple Model Search In-Reply-To: <20070501170720.GA25006@cordoba.webit.de> References: <27CCBB0B-8FD0-412E-AF84-A481512BF21B@housecafemusic.com> <20070418141625.GO5558@cordoba.webit.de> <20070418152907.GR5558@cordoba.webit.de> <53F0764D-453B-4FE4-870C-54AB623E1258@housecafemusic.com> <20070418224044.GB22049@cordoba.webit.de> <20070419073947.GS5558@cordoba.webit.de> <1F0AC985-EEEC-4FE6-8E06-40E5EB73A0AD@housecafemusic.com> <21BBECDB-DEC8-4991-9E04-FA8EA1FD5E22@housecafemusic.com> <20070501170720.GA25006@cordoba.webit.de> Message-ID: Update: It's all working now. Thanks Jens for all your help! I need to work out my results, but that is worth a new thread methinks cheers! From steven at housecafemusic.com Tue May 1 15:15:47 2007 From: steven at housecafemusic.com (Steven Garcia) Date: Tue, 1 May 2007 21:15:47 +0200 Subject: [Ferret-talk] Links in Search Results Message-ID: <3F042EB4-52E4-4C5A-9E2C-2E0449974889@housecafemusic.com> I am using a helper method to generate my search result listings: def search_result_name(result) case result when Term "

#{result.title}

Filed under: Terms" when Article "

#{result.title}

Filed under: Articles" else result.to_s end end This works perfectly, but I hit a wall when I try to include a link in the above method. I try using a link_to helper inside a string interpolation block but Rails does not parse the helper. Any ideas? From hechengcai at tom.com Wed May 2 06:12:08 2007 From: hechengcai at tom.com (Chengcai He) Date: Wed, 2 May 2007 12:12:08 +0200 Subject: [Ferret-talk] Wrong total_hits when using conditions in find_by_contents Message-ID: <79ede73a047bb56bad3e5d0f05763f0f@ruby-forum.com> In my model Topic: acts_as_ferret({ :fields => {:username => {:store => :yes, :boost => 30}, :subject => {:store => :yes, :boost => 20}, :body => {:store => :yes, :boost => 10}}, :remote => true }, { :analyzer => Ferret::Analysis::RegExpAnalyzer.new(/./, false) }) def self.full_text_search(q, options = {}, find_options = {}) return nil if q.nil? or q=="" default_options = {:limit => 10, :page => 1} options = default_options.merge options # get the offset based on what page we're on options[:offset] = options[:limit] * (options.delete(:page).to_i-1) # now do the query with our options results = Topic.find_by_contents(q, options, find_options) return [results.total_hits, results] end in my SearchController: if params[:doSearch] == "true" if params[:query] == "" flash[:notice] = 'Please enter some words to search on.' else @conditions = " 1 = 1"; if params[:dateRange] != "" @conditions += " and creationDate >= " + params[:dateRange] end if params[:forumID] != "" @conditions += " and forum_id = " + params[:forumID] end @total, @topics = Topic.full_text_search(params[:query], {:page => (params[:page]||1)}, {:conditions => @conditions}) @pages = pages_for(@total) end it always return only 10 search results. no more search result. i don't know why! and this article doesn't work! http://www.ruby-forum.com/topic/93822 thanks! -- Posted via http://www.ruby-forum.com/. From hechengcai at tom.com Wed May 2 06:20:17 2007 From: hechengcai at tom.com (Chengcai He) Date: Wed, 2 May 2007 12:20:17 +0200 Subject: [Ferret-talk] Wrong total_hits when using conditions in find_by_conten In-Reply-To: <79ede73a047bb56bad3e5d0f05763f0f@ruby-forum.com> References: <79ede73a047bb56bad3e5d0f05763f0f@ruby-forum.com> Message-ID: I've input the username into the index, when i input the username to search, it will return all the topics posted by the user, more than 10. but when i use the conditions in find_by_contents, it only return 10 results! I'm completely losted! -- Posted via http://www.ruby-forum.com/. From hechengcai at tom.com Wed May 2 06:36:18 2007 From: hechengcai at tom.com (Chengcai He) Date: Wed, 2 May 2007 12:36:18 +0200 Subject: [Ferret-talk] Wrong total_hits when using conditions in find_by_conten In-Reply-To: References: <79ede73a047bb56bad3e5d0f05763f0f@ruby-forum.com> Message-ID: <3eac8a6a25d974a830e427f263412bdd@ruby-forum.com> it generate the mysql sql statement: SELECT * FROM topics WHERE (topics.id in ('35','68','36','69','37','17','30','29','18','31') and 1 = 1 and creationDate >= 20070201) in production.log. -- Posted via http://www.ruby-forum.com/. From steven at housecafemusic.com Wed May 2 07:25:16 2007 From: steven at housecafemusic.com (Steven Garcia) Date: Wed, 2 May 2007 13:25:16 +0200 Subject: [Ferret-talk] Links in Search Results In-Reply-To: <3F042EB4-52E4-4C5A-9E2C-2E0449974889@housecafemusic.com> References: <3F042EB4-52E4-4C5A-9E2C-2E0449974889@housecafemusic.com> Message-ID: I figured it out. All I needed was to move my case statements to my view and it works (duh!) From steven at housecafemusic.com Wed May 2 07:34:30 2007 From: steven at housecafemusic.com (Steven Garcia) Date: Wed, 2 May 2007 13:34:30 +0200 Subject: [Ferret-talk] Rewarding exact matches Message-ID: Is there a way I can get ferret to give the highest ranking to an exact term match? The problem I have right now is that I am searching both title and body fields, so even if I boost the title field, if the body has more instances of the query, then it gets pushed up in rank. I would like for ferret to put exact matches (of the title field) at the very top of the pile, so if I do a search for say "color", the results look like this ? Color ? Color Theory ? Color Management Right now the order is like this ? Color Theory ? Color Management ? Color Because the first two articles have more instances in the body field. Is this possible? From kraemer at webit.de Wed May 2 07:36:41 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 2 May 2007 13:36:41 +0200 Subject: [Ferret-talk] Links in Search Results In-Reply-To: <3F042EB4-52E4-4C5A-9E2C-2E0449974889@housecafemusic.com> References: <3F042EB4-52E4-4C5A-9E2C-2E0449974889@housecafemusic.com> Message-ID: <20070502113641.GA4687@cordoba.webit.de> On Tue, May 01, 2007 at 09:15:47PM +0200, Steven Garcia wrote: > I am using a helper method to generate my search result listings: > > def search_result_name(result) > case result > when Term > "

#{result.title}

> Filed under: Terms" > when Article > "

#{result.title}

> Filed under: Articles" > else > result.to_s > end > end > > This works perfectly, but I hit a wall when I try to include a link > in the above method. I try using a link_to helper inside a string > interpolation block but Rails does not parse the helper. strange, I often use link_to in helpers, i.e. to embed links in list elements: def navsec_link(text, link, options = {}) "
  • #{link_to text, link, options}
  • " end Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From steven at housecafemusic.com Wed May 2 07:44:45 2007 From: steven at housecafemusic.com (Steven Garcia) Date: Wed, 2 May 2007 13:44:45 +0200 Subject: [Ferret-talk] Links in Search Results In-Reply-To: <20070502113641.GA4687@cordoba.webit.de> References: <3F042EB4-52E4-4C5A-9E2C-2E0449974889@housecafemusic.com> <20070502113641.GA4687@cordoba.webit.de> Message-ID: <4956819D-B057-4447-95EE-E5EACA77C923@housecafemusic.com> Ahhhh.. i see now. My syntax was off. I tried shoving ERB code into the string interpolation, silly man that I am. Still, I almost prefer this in my view since I am only using results like this in one template. Thanks for the pointer though, will definitely prove useful for something else. From kraemer at webit.de Wed May 2 07:50:03 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 2 May 2007 13:50:03 +0200 Subject: [Ferret-talk] Rewarding exact matches In-Reply-To: References: Message-ID: <20070502115003.GB4687@cordoba.webit.de> On Wed, May 02, 2007 at 01:34:30PM +0200, Steven Garcia wrote: > Is there a way I can get ferret to give the highest ranking to an > exact term match? > > The problem I have right now is that I am searching both title and > body fields, so even if I boost the title field, if the body has more > instances of the query, then it gets pushed up in rank. > > I would like for ferret to put exact matches (of the title field) at > the very top of the pile, so if I do a search for say "color", the > results look like this > > ? Color > ? Color Theory > ? Color Management > > Right now the order is like this > > ? Color Theory > ? Color Management > ? Color > > Because the first two articles have more instances in the body field. > > Is this possible? If setting the boost for the title field really high (and the one for the body really low, maybe even below 1) doesn't help, you could run the query twice, once against the title field only, and once against the body. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Wed May 2 08:06:12 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 2 May 2007 14:06:12 +0200 Subject: [Ferret-talk] Return which field/index a result hit was found in In-Reply-To: <879d45c60704270627t5602ebbcw7b51e812ec3d2fc7@mail.gmail.com> References: <879d45c60704270627t5602ebbcw7b51e812ec3d2fc7@mail.gmail.com> Message-ID: <20070502120612.GC4687@cordoba.webit.de> On Fri, Apr 27, 2007 at 09:27:48AM -0400, andy mitlenatch wrote: > Hello! > [..] > I have a model, 'Book', that has associations to several other models, > comments, tags, authors, etc.. I have implemented acts_as_ferret to search > the Book model and the associations as follows: > [..] > I can successfully search the Book model and all of its associations. Next, > I would like to print out in which field/association the hit occurred. For > example, I would like to print to the user whether the hit was in the book > title, a tag, in a comment or in the collection of authors, as I am only > printing the title of the book and some higher level details in the search > results. The information in which field a hit occured is not available from Ferret results directly, so to find this out you'd have to re-run the query against each of the fields in question for each record in your result set (scope your search to exactly this record by using "id:#{record.id}" as a part of your query). Not a good solution, I must admit - might be easier (and faster) to run separate queries for each field in the beginning and then manually merge these results, preserving the information which field had the hit. But beware of duplicates because of records having matches in multiple fields. Also such records would score better if you only ran one query across all fields - so maybe you would want to still do that to get exact scores for sorting the results by relevancy... I feel like there should be an easier way to do this, maybe somebody else has a better idea? Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From steven at housecafemusic.com Wed May 2 08:17:19 2007 From: steven at housecafemusic.com (Steven Garcia) Date: Wed, 2 May 2007 14:17:19 +0200 Subject: [Ferret-talk] Rewarding exact matches In-Reply-To: <20070502115003.GB4687@cordoba.webit.de> References: <20070502115003.GB4687@cordoba.webit.de> Message-ID: <42149D18-5DEF-47E2-85DA-E1EF669106B2@housecafemusic.com> Setting the boost as you specified didn't work. How would I set up two queries? My code: http://pastie.caboo.se/58230 From kraemer at webit.de Wed May 2 08:31:33 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 2 May 2007 14:31:33 +0200 Subject: [Ferret-talk] Wrong total_hits when using conditions in find_by_contents In-Reply-To: <79ede73a047bb56bad3e5d0f05763f0f@ruby-forum.com> References: <79ede73a047bb56bad3e5d0f05763f0f@ruby-forum.com> Message-ID: <20070502123133.GD4687@cordoba.webit.de> Hi! That's a nice bug you've found there - getting the real result count is a bit tricky when the result set is limited by both ferret's :limit option and active record conditions. It's one of those things I always wanted to fix but finally forgot about ;-) I committed a possible fix right now that comes at the cost of an additional ferret query and a "select count(*) where ...". Would be nice if you could try out the current trunk of aaf to see if this fixes the problem. If the additional queries and counting are too slow for you (but first, give it a try ;-), you could eliminate the need for active record conditions by indexing the forum_id and creationDate columns as untokenized values and let ferret handle them. Jens On Wed, May 02, 2007 at 12:12:08PM +0200, Chengcai He wrote: > In my model Topic: > > acts_as_ferret({ :fields => {:username => {:store => :yes, :boost => > 30}, :subject => {:store => :yes, :boost => 20}, :body => {:store => > :yes, :boost => 10}}, :remote => true }, { :analyzer => > Ferret::Analysis::RegExpAnalyzer.new(/./, false) }) > > def self.full_text_search(q, options = {}, find_options = {}) > return nil if q.nil? or q=="" > default_options = {:limit => 10, :page => 1} > options = default_options.merge options > > # get the offset based on what page we're on > options[:offset] = options[:limit] * (options.delete(:page).to_i-1) > > # now do the query with our options > results = Topic.find_by_contents(q, options, find_options) > return [results.total_hits, results] > end > > in my SearchController: > > if params[:doSearch] == "true" > if params[:query] == "" > flash[:notice] = 'Please enter some words to search on.' > else > @conditions = " 1 = 1"; > if params[:dateRange] != "" > @conditions += " and creationDate >= " + params[:dateRange] > end > if params[:forumID] != "" > @conditions += " and forum_id = " + params[:forumID] > end > > @total, @topics = Topic.full_text_search(params[:query], {:page => > (params[:page]||1)}, {:conditions => @conditions}) > @pages = pages_for(@total) > end > > it always return only 10 search results. no more search result. i don't > know why! > > and this article doesn't work! > http://www.ruby-forum.com/topic/93822 > > thanks! > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From reverri at gmail.com Wed May 2 11:22:05 2007 From: reverri at gmail.com (Dan Reverri) Date: Wed, 2 May 2007 17:22:05 +0200 Subject: [Ferret-talk] MultiSearcher Results Question Message-ID: <2dca3ade964acc445bb6c9eb5b71bae5@ruby-forum.com> If searching multiple indexes with a single searcher is there anyway to identify which index a resulting doc is stored in? i1 = Ferret::I.new i2 = Ferret::I.new i1 << {:id=>1,:text=>"random stuff"} i1.commit reader = Ferret::Index::IndexReader.new([i1.options[:dir],i2.options[:dir]]) searcher = Ferret::Index::IndexSearcher.new(reader) query = Ferret::Search::MatchAllQuery.new searcher.search_each(query) do |doc_id,score| puts reader[doc_id][:text] # Anyway to find what index this doc is stored in? end -- Posted via http://www.ruby-forum.com/. From antti at akonniemi.fi Wed May 2 15:11:12 2007 From: antti at akonniemi.fi (Antti Akonniemi) Date: Wed, 2 May 2007 21:11:12 +0200 Subject: [Ferret-talk] Index update - problems Message-ID: <3074ddbff12376400d1b5ddea5864bb0@ruby-forum.com> Hi, Rails 1.2.2 Ruby 1.8.5 Ferret 0.11.4 I have pretty big forum that has 2 columns that are indexed, in addition to this couple more tables with far less data are indexed. It seems that first index update works. Mysqld process takes the idle CPU time, but behaves nicely.. until at some point it takes 99% and doesn't let other processes use it :) I'm forced to restart mysqld. Then again it works for a while, but then after some new rows problems begin. I'm using :remote => true for my models I started my ferret server with this command: FERRET_USE_LOCAL_INDEX=1 RAILS_ENV=production ruby script/runner 'load "script/ferret_start"' Has anyone else bumped into this kind of behaviour? Any ideas what might be causing it? Regards, Antti -- Posted via http://www.ruby-forum.com/. From snowstorm+rubyforum at gmail.com Thu May 3 02:56:51 2007 From: snowstorm+rubyforum at gmail.com (Yaxm Yaxm) Date: Thu, 3 May 2007 08:56:51 +0200 Subject: [Ferret-talk] Numeric Range or comparision doesn't work Message-ID: Hi, it looks like Ferret still compares numeric fields by lexical ordering, not numerical ordering. I am using Ferret 0.11.4(I tried in both linux and windows, the results are the same). index = Ferret::Index::Index.new() docs = [ {:num => 1, :data => "yes"}, {:num => 1, :data => "no"}, {:num => 10, :data => "yes"}, {:num => 10, :data => "no"}, {:num => 100, :data => "yes"}, {:num => 100, :data => "no"}, {:num => 1000, :data => "yes"}, {:num => 1000, :data => "no"} ] ?> puts index.process_query('data:yes AND num:[10 100]') +data:yes +num:[10 100] => nil >> puts index.search('d:data:yes AND num:[10 100]') TopDocs: total_hits = 2, max_score = 1.777895 [ 2 "": 1.777895 4 "": 1.777895 ] => nil >> puts index.process_query('data:yes AND num:[2 100]') num:"data yes <> num 2 100"~4 => nil >> puts index.process_query('num:[2 100]') num:"num 2 100"~2 => nil >> puts index.search('num:[2 100]') TopDocs: total_hits = 0, max_score = 0.000000 [ ] => nil >> puts index.process_query('num:>2') num:{2> => nil >> puts index.search('num:>2') TopDocs: total_hits = 0, max_score = 0.000000 [ ] => nil According to the release note for Ferret 0.10.6 at http://rubyforge.org/forum/forum.php?forum_id=9058, "Range queries just work. No need to pad numbers or format dates correctly." Is this a new bug? Thanks. Yaxm -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu May 3 04:00:42 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 3 May 2007 10:00:42 +0200 Subject: [Ferret-talk] Index update - problems In-Reply-To: <3074ddbff12376400d1b5ddea5864bb0@ruby-forum.com> References: <3074ddbff12376400d1b5ddea5864bb0@ruby-forum.com> Message-ID: <20070503080042.GF4687@cordoba.webit.de> On Wed, May 02, 2007 at 09:11:12PM +0200, Antti Akonniemi wrote: > Hi, > > Rails 1.2.2 > Ruby 1.8.5 > Ferret 0.11.4 > > I have pretty big forum that has 2 columns that are indexed, in addition > to this couple more tables with far less data are indexed. > > It seems that first index update works. Mysqld process takes the idle > CPU time, but behaves nicely.. until at some point it takes 99% and > doesn't let other processes use it :) I'm forced to restart mysqld. Then > again it works for a while, but then after some new rows problems begin. > > I'm using :remote => true for my models > > I started my ferret server with this command: > FERRET_USE_LOCAL_INDEX=1 RAILS_ENV=production ruby script/runner 'load > "script/ferret_start"' > > Has anyone else bumped into this kind of behaviour? Any ideas what might > be causing it? maybe there's some mysql limit (number of connections, threads, mem usage) that gets reached? You could monitor the mysql server variables (show status) to find out. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Thu May 3 04:02:21 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 3 May 2007 10:02:21 +0200 Subject: [Ferret-talk] MultiSearcher Results Question In-Reply-To: <2dca3ade964acc445bb6c9eb5b71bae5@ruby-forum.com> References: <2dca3ade964acc445bb6c9eb5b71bae5@ruby-forum.com> Message-ID: <20070503080221.GG4687@cordoba.webit.de> On Wed, May 02, 2007 at 05:22:05PM +0200, Dan Reverri wrote: > If searching multiple indexes with a single searcher is there anyway to > identify which index a resulting doc is stored in? > > i1 = Ferret::I.new > i2 = Ferret::I.new > > i1 << {:id=>1,:text=>"random stuff"} > i1.commit > > reader = > Ferret::Index::IndexReader.new([i1.options[:dir],i2.options[:dir]]) > > searcher = Ferret::Index::IndexSearcher.new(reader) > > query = Ferret::Search::MatchAllQuery.new > > searcher.search_each(query) do |doc_id,score| > puts reader[doc_id][:text] # Anyway to find what index this doc is > stored in? > end I don't think so. I think I would store a flag indicating which index the doc is in at indexing time with each doc. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From dburkes at netable.com Thu May 3 18:34:38 2007 From: dburkes at netable.com (Danny Burkes) Date: Fri, 4 May 2007 00:34:38 +0200 Subject: [Ferret-talk] Custom analyzer weirdness with 0.11.3 Message-ID: <5689f0315cdb506ca3654a384f78cc5f@ruby-forum.com> Hi- I was previously using 0.11.4, and I wrote my own analyzer. Everything worked fine. When I took the system to production, 0.11.4 starting failing updating the index, complaining that files were missing. The failure always happened on the same model document, and was completely reproducible. This failure looked a lot like the one described at http://www.ruby-forum.com/topic/104145. I reverted to 0.11.3, and all my model documents index fine (over 3M documents). However, as I later found out, my custom analyzer was returning bogus data, so the index as currently built is useless. What I observe is that, if I specify a custom analyzer using the :analyzer option to acts_as_ferret, the calls to my custom analyzer are fine when using Ferret 0.11.4. However, when I reverted back to 0.11.3, calls to my analyzer's token_stream method always have a blank string. That is, the "input" parameter to http://ferret.davebalmain.com/api/classes/Ferret/Analysis/Analyzer.html#M000324 is always a blank string. The field_name parameter is correct for both 0.11.4 and 0.11.3. So, now, I'm in a bad situation. My custom analyzer works with 0.11.4, but 0.11.4 fails to index my corpus. 0.11.3 will index my entire corpus, but my custom analyzer fails, apparently due to some calling convention differences between 0.11.3 and 0.11.4. Does this ring a bell to anyone? I'm stuck and I would appreciate any help I can get. Best Regards, Danny -- Posted via http://www.ruby-forum.com/. From antti at akonniemi.fi Fri May 4 01:23:58 2007 From: antti at akonniemi.fi (Antti Akonniemi) Date: Fri, 4 May 2007 07:23:58 +0200 Subject: [Ferret-talk] Index update - problems In-Reply-To: <20070503080042.GF4687@cordoba.webit.de> References: <3074ddbff12376400d1b5ddea5864bb0@ruby-forum.com> <20070503080042.GF4687@cordoba.webit.de> Message-ID: <3ef56608f8293c3b0e2d39f42cfbffc3@ruby-forum.com> Found the problem! Wasn't actually Rails related. Mysql udpdate had messed something and was causing a ton of troubles. What a relief! :) AAF is now working like a charm! Thanks for your time! -antti -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri May 4 04:17:51 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 4 May 2007 10:17:51 +0200 Subject: [Ferret-talk] Custom analyzer weirdness with 0.11.3 In-Reply-To: <5689f0315cdb506ca3654a384f78cc5f@ruby-forum.com> References: <5689f0315cdb506ca3654a384f78cc5f@ruby-forum.com> Message-ID: <20070504081751.GM4687@cordoba.webit.de> On Fri, May 04, 2007 at 12:34:38AM +0200, Danny Burkes wrote: > Hi- > > I was previously using 0.11.4, and I wrote my own analyzer. Everything > worked fine. > > When I took the system to production, 0.11.4 starting failing updating > the index, complaining that files were missing. The failure always > happened on the same model document, and was completely reproducible. > This failure looked a lot like the one described at > http://www.ruby-forum.com/topic/104145. Bad you still have this problem. Did you try to run Ferret's unit tests on that Mac? > I reverted to 0.11.3, and all my model documents index fine (over 3M > documents). However, as I later found out, my custom analyzer was > returning bogus data, so the index as currently built is useless. > > What I observe is that, if I specify a custom analyzer using the > :analyzer option to acts_as_ferret, the calls to my custom analyzer are > fine when using Ferret 0.11.4. However, when I reverted back to 0.11.3, > calls to my analyzer's token_stream method always have a blank string. > That is, the "input" parameter to > http://ferret.davebalmain.com/api/classes/Ferret/Analysis/Analyzer.html#M000324 > is always a blank string. The field_name parameter is correct for both > 0.11.4 and 0.11.3. There was a conversation about this issue here right before 0.11.4 was released, where Dave explains what is happening: http://www.ruby-forum.com/topic/103004#231032 I'm not sure but maybe with the help of that posting you could change your analyzer to work with 0.11.3... jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From fengyu8299 at gmail.com Fri May 4 05:39:38 2007 From: fengyu8299 at gmail.com (Captain Feng) Date: Fri, 4 May 2007 11:39:38 +0200 Subject: [Ferret-talk] Chinese full-text support! Still fail-_- In-Reply-To: References: Message-ID: Captain Feng wrote: > Hi all, > I want to use ferrent in my website but when i input chinese words, i > have the same symptom like Chengcai. In order to fix it, i have reviewed > all the topics about chinese support in our forum and tried all the way > your guys suggested but still made any progress. i downloaded the latest > version of ferret from svn. > > Thanks and regards. > captain > > Chengcai He wrote: >> Hello everyone! >> >> I use the ferret as the following: >> acts_as_ferret :fields => [:subject, :body], :analyzer => >> Ferret::Analysis::RegExpAnalyzer.new(/./,false) >> >> when i input the english word to search, it's so cool and so soon to got >> the result! but when i input the chinese words to search, the ruby >> allocate all the memory and the computer has no response, after a long >> long time wait, there's a exception: failed to allocate memory! In the >> log file: Adding field body with value '????', it's true that ferret add >> the chinese text into the index, but i can not search chinese words! >> >> I don't know how to deal with this! > > Jens Kraemer wrote: >> On Thu, Apr 19, 2007 at 03:49:48PM +0200, Chengcai He wrote: >>> in environment.rb, i add the following code >>> $KCODE = 'u' >>> require 'jcode' >>> ENV['LANG'] = 'en_US.utf8' >>> require 'acts_as_ferret' >>> >>> in my model, topic.rb >>> acts_as_ferret :fields => [:subject, :body], :analyzer => >>> Ferret::Analysis::RegExpAnalyzer.new(/./,false) >> >> to make aaf use your analyzer, please format the call like this: >> >> acts_as_ferret { :fields => [:subject, :body] }, >> { :analyzer => >> Ferret::Analysis::RegExpAnalyzer.new(/./,false) } >> >> >> I'm seriously thinking about an API change because people always mix >> up the two hashes. >> >> Jens Finally, i moved to ubuntu and ferret performed well, supporting both English and Chinese, cheers. But i still wonder if there is any guys succesfully run ferret in Windows without the problem of Chinese support, can u share the experience? Regards. captain -- Posted via http://www.ruby-forum.com/. From dburkes at netable.com Fri May 4 09:27:05 2007 From: dburkes at netable.com (Danny Burkes) Date: Fri, 4 May 2007 15:27:05 +0200 Subject: [Ferret-talk] Custom analyzer weirdness with 0.11.3 In-Reply-To: <20070504081751.GM4687@cordoba.webit.de> References: <5689f0315cdb506ca3654a384f78cc5f@ruby-forum.com> <20070504081751.GM4687@cordoba.webit.de> Message-ID: >> When I took the system to production, 0.11.4 starting failing updating >> the index, complaining that files were missing. The failure always >> happened on the same model document, and was completely reproducible. >> This failure looked a lot like the one described at >> http://www.ruby-forum.com/topic/104145. > > Bad you still have this problem. Did you try to run Ferret's unit tests > on that Mac? > I didn't, but what I am describing here is a different problem than the one I previously described on OS X (http://www.ruby-forum.com/topic/106125). This new bug occurs in our production environment, running on Ubuntu 6.10. >> is always a blank string. The field_name parameter is correct for both >> 0.11.4 and 0.11.3. > > There was a conversation about this issue here right before 0.11.4 was > released, where Dave explains what is happening: > http://www.ruby-forum.com/topic/103004#231032 > > I'm not sure but maybe with the help of that posting you could change > your > analyzer to work with 0.11.3... > Thanks, I've read that thread and I think I understand what I need to do to get my custom analyzer working with 0.11.3. I'll go that route for now. Thanks for your help! Best Regards, Danny -- Posted via http://www.ruby-forum.com/. From doug.arogos at gmail.com Fri May 4 20:50:39 2007 From: doug.arogos at gmail.com (Doug Smith) Date: Fri, 4 May 2007 17:50:39 -0700 Subject: [Ferret-talk] Stop words, fields, StandardAnalyzer quagmire Message-ID: <42d8808f0705041750lf19dd10oec54dbd8f12d0d3d@mail.gmail.com> Hello, I'm using: Ruby 1.8.6, Rails 1.2.3, ferret 0.11.4, acts_as_ferret from svn stable. I've had quite a day wrestling with trying to remove the use of stopwords. The problem was that when searching for words like "no" or "the", no results were found. I found a confusing thing behavior that has taken me some time to figure out, and I hope sharing it saves someone else some time. >From searching around online and in the source code I came up with the following config in my ActiveRecord model: acts_as_ferret({:fields => {:name => {:boost => 10}, :type => {:boost => 2}, :email => {:boost => 10}, :bio => {:store => :no}, :status_id => {:boost => 1}}, :store_class_name => true, :remote => true, :ferret => { :analyzer => Ferret::Analysis::StandardAnalyzer.new([]) } } ) With the StandardAnalyzer added, I do find results with "no" or "the". The complicating factor is that as you can see, I have a field "status_id". This field lets me filter for profiles that are published or draft in my CMS. Before I added the StandardAnalyzer, the status_id field worked fine in queries like this: a = Profile.find_by_contents("smith status_id:100") a.total_hits => 2 # this is correct, only 2 are published a = Profile.find_by_contents("smith") a.total_hits => 4 # this is correct, there are 4 total So, you can see that the status_id was automatically "AND"-ed to the query word. However, after adding the above StandardAnalyzer config, the status_id was now "OR"-ed, like so: a = Profile.find_by_contents("no") a.total_hits => 5 # this is good a = Profile.find_by_contents("no status_id:100") a.total_hits => 208 # this is bad -- it's the same as if I only searched for status_id:100. a = Profile.find_by_contents("smith status_id:100") a.total_hits => 208 # this is just as bad -- it's the same as if I only searched for status_id:100. The fix here is to add the AND keyword explicitly to the query: a = Profile.find_by_contents("smith AND status_id:100") a.total_hits => 2 # works just like before. In fact, OR becomes the default search regardless of whether I use a field in the query: a = Profile.find_by_contents("smith jones") a.total_hits => 5 # OR'ed results a = Profile.find_by_contents("smith AND jones") a.total_hits => 0 Again, before StandardAnalyzer, "AND" was the default so the first "smith jones" query would have returned 0 as it should. Any insight as to why this might be? I would prefer AND to be the default. Thanks, Doug From snowstorm+rubyforum at gmail.com Sun May 6 23:47:06 2007 From: snowstorm+rubyforum at gmail.com (Yaxm Yaxm) Date: Mon, 7 May 2007 05:47:06 +0200 Subject: [Ferret-talk] Querying against numeric fields? e.g. price:( >= min_pri In-Reply-To: References: <6fdc0ba6d6fd714df36db1796e7449b3@ruby-forum.com> Message-ID: <0e7a3d633a2a8e4557cc85ae3fbbae89@ruby-forum.com> This doesn't work because "10" is used. > puts index.process_query('data:yes AND num:[10 100]') > puts index.search('data:yes AND num:[10 100]') One must pad the integer with zero's just like the analyzer: puts index.process_query('data:yes AND num:[00010 00100]') puts index.search('data:yes AND num:[00010 00100]') To use this in a Rails app: include the IntegerAnalyzer defnition in a file. then require the file in your model. inside your model class: include Ferret::Analysis analyzer = PerFieldAnalyzer.new(StandardAnalyzer.new) analyzer[:num] = IntegerAnalyzer.new(5) analyzer[:blah_field] = IntegerAnalyzer.new(5) acts_as_ferret( {:fields => [:blah_field] } , {:analyzer => analyzer} David Balmain wrote: > On 9/13/06, Tom Beddard wrote: >> integer (so no .00 to cause confusion) I'm getting results in the 50 >> value range as well as 500 if I set the min price as 500. I presume >> ferret is doing the price as a string comparison, but is there any way >> to make it do a numeric match? >> >> Thanks > > Hi Tom, > > You need to pad all numbers to a fixed width when adding them to the > index as well as when querying the index. Usually you'd write the code > to do this yourself. I've recently come up with another way to do > this. > > require 'ferret' > > module Ferret::Analysis > class IntegerTokenizer > def initialize(num, width) > @num = num.to_i > @width = width > @done = false > end > def next > if @done > return nil > else > @done = true > puts Token.new("%0#{@width}d" % @num, 0, @width) > return Token.new("%0#{@width}d" % @num, 0, @width) > end > end > def text=(text) > @num = text.to_i > @done = false > end > end > > class IntegerAnalyzer > def initialize(width) > @width = width > end > def token_stream(field, input) > return IntegerTokenizer.new(input, @width) > end > end > end > > include Ferret::Analysis > analyzer = PerFieldAnalyzer.new(StandardAnalyzer.new) > analyzer[:num] = IntegerAnalyzer.new(5) > > index = Ferret::Index::Index.new(:analyzer => analyzer) > docs = [ > {:num => 1, :data => "yes"}, > {:num => 1, :data => "no"}, > {:num => 10, :data => "yes"}, > {:num => 10, :data => "no"}, > {:num => 100, :data => "yes"}, > {:num => 100, :data => "no"}, > {:num => 1000, :data => "yes"}, > {:num => 1000, :data => "no"} > ] > > docs.each { |d| index << d } > > puts index.process_query('data:yes AND num:[10 100]') > puts index.search('data:yes AND num:[10 100]') > > This will only work with the working copy of Ferret from the > subversion repository. I'm still not convinced that this is the best > way to do it. > > Cheers, > Dave -- Posted via http://www.ruby-forum.com/. From snowstorm+rubyforum at gmail.com Sun May 6 23:49:06 2007 From: snowstorm+rubyforum at gmail.com (Yaxm Yaxm) Date: Mon, 7 May 2007 05:49:06 +0200 Subject: [Ferret-talk] Numeric Range or comparision doesn't work In-Reply-To: References: Message-ID: <9084a87471fa58efa1a13e289b5bd635@ruby-forum.com> To answer my own questions, the problem I had was that I didn't pad zeros in my query as the same way as in the IntegerAnalyzer. # rip from the "Ferret" ebook at http://www.oreilly.com/catalog/9780596527853/index.html module Ferret::Analysis # range comparision is done by lexical order, not numeric order class IntegerTokenizer def initialize(num, width) @num = num.to_i @width = width end def next token = Token.new("%0#{@width}d" % @num, 0, @width) if @num @num = nil return token end def text=(text) @num = text.to_i end end class IntegerAnalyzer def initialize(width) @width = width end def token_stream(field, input) return IntegerTokenizer.new(input, @width) end end end include Ferret::Analysis analyzer = PerFieldAnalyzer.new(StandardAnalyzer.new) # "num" is the name of the field which to use this analyzer analyzer[:num] = IntegerAnalyzer.new(3) index = Ferret::Index::Index.new(:analyzer => analyzer) docs = [ {:num => 5}, {:num => 15}, {:num => 30} ] docs.each { |d| index << d } >> puts index.search('num:[001 020]') TopDocs: total_hits = 2, max_score = 1.000000 [ 0 "": 1.000000 1 "": 1.000000 ] => nil >> ?> puts index.search('num:[010 100]') TopDocs: total_hits = 2, max_score = 1.000000 [ 1 "": 1.000000 2 "": 1.000000 ] Yaxm Yaxm wrote: > Hi, > it looks like Ferret still compares numeric fields by lexical ordering, > not numerical ordering. I am using Ferret 0.11.4(I tried in both linux > and windows, the results are the same). > > > index = Ferret::Index::Index.new() > docs = [ > {:num => 1, :data => "yes"}, > {:num => 1, :data => "no"}, > {:num => 10, :data => "yes"}, > {:num => 10, :data => "no"}, > {:num => 100, :data => "yes"}, > {:num => 100, :data => "no"}, > {:num => 1000, :data => "yes"}, > {:num => 1000, :data => "no"} > ] > > ?> puts index.process_query('data:yes AND num:[10 100]') > +data:yes +num:[10 100] > => nil >>> puts index.search('d:data:yes AND num:[10 100]') > TopDocs: total_hits = 2, max_score = 1.777895 [ > 2 "": 1.777895 > 4 "": 1.777895 > ] > => nil >>> puts index.process_query('data:yes AND num:[2 100]') > num:"data yes <> num 2 100"~4 > => nil >>> puts index.process_query('num:[2 100]') > num:"num 2 100"~2 > => nil >>> puts index.search('num:[2 100]') > TopDocs: total_hits = 0, max_score = 0.000000 [ > ] > => nil >>> puts index.process_query('num:>2') > num:{2> > => nil >>> puts index.search('num:>2') > TopDocs: total_hits = 0, max_score = 0.000000 [ > ] > => nil > > > According to the release note for Ferret 0.10.6 at > http://rubyforge.org/forum/forum.php?forum_id=9058, "Range queries just > work. No need to pad numbers or format dates correctly." > > Is this a new bug? > > Thanks. > Yaxm -- Posted via http://www.ruby-forum.com/. From ab.mahmoodi at gmail.com Mon May 7 04:34:15 2007 From: ab.mahmoodi at gmail.com (Abolfazl Mahmoodi) Date: Mon, 7 May 2007 10:34:15 +0200 Subject: [Ferret-talk] using acts_as_ferret in persian language Message-ID: <8b7bedf545c4d3a23b7d5682f0db82cf@ruby-forum.com> hi i installed ferret and acts_as_ferret successfully. but my Persian character search has not correct result. my code: acts_as_ferret :fields => [:fname] , :analyzer => Ferret::Analysis::RegExpAnalyzer.new(/./,false) -- Posted via http://www.ruby-forum.com/. From antti at akonniemi.fi Mon May 7 23:02:40 2007 From: antti at akonniemi.fi (Antti Akonniemi) Date: Tue, 8 May 2007 05:02:40 +0200 Subject: [Ferret-talk] Limit + Conditions = Confusion Message-ID: Hi, Could you please explain the :limit to me? I thought I knew what it was but now it seems I've misunderstood something. >> Fooditem.find_by_contents("*quark*", {:limit => 10}, {:conditions => ["origin = ?", "fda"]}).length => 1 >> Fooditem.find_by_contents("*quark*", {:limit => 2000000}, {:conditions => ["origin = ?", "fda"]}).length => 8 So is limit actually limiting the results before conditions? Regards, Antti -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue May 8 03:54:24 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 8 May 2007 09:54:24 +0200 Subject: [Ferret-talk] Limit + Conditions = Confusion In-Reply-To: References: Message-ID: <20070508075424.GD25267@cordoba.webit.de> On Tue, May 08, 2007 at 05:02:40AM +0200, Antti Akonniemi wrote: > Hi, > > Could you please explain the :limit to me? I thought I knew what it was > but now it seems I've misunderstood something. > > >> Fooditem.find_by_contents("*quark*", {:limit => 10}, {:conditions => ["origin = ?", "fda"]}).length > => 1 > >> Fooditem.find_by_contents("*quark*", {:limit => 2000000}, {:conditions => ["origin = ?", "fda"]}).length > => 8 > > > So is limit actually limiting the results before conditions? yes, the limit option you're using (in the first options hash) is passed to ferret while AR conditions are applied after running the ferret search. To get exact results you should use :limit => :all in the first hash and put the 'real' limit into the AR options hash. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From snowstorm+rubyforum at gmail.com Tue May 8 05:14:07 2007 From: snowstorm+rubyforum at gmail.com (Yaxm Yaxm) Date: Tue, 8 May 2007 11:14:07 +0200 Subject: [Ferret-talk] case sensitivity for untokenized fields Message-ID: <4d324e9294c627ede341d397c542a1aa@ruby-forum.com> Hi, I have a address model. I make the city and the state field untokenized. It looks like Ferret doesn't perform downcasing for these fields in the index. so the search can't be done case-insensitively. how do I solve this problem? Downcase the indexed terms as well as the search value? Is there a simpler solution? Thanks. Yaxm -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue May 8 05:35:48 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 8 May 2007 11:35:48 +0200 Subject: [Ferret-talk] Stop words, fields, StandardAnalyzer quagmire In-Reply-To: <42d8808f0705041750lf19dd10oec54dbd8f12d0d3d@mail.gmail.com> References: <42d8808f0705041750lf19dd10oec54dbd8f12d0d3d@mail.gmail.com> Message-ID: <20070508093548.GG25267@cordoba.webit.de> Hi! On Fri, May 04, 2007 at 05:50:39PM -0700, Doug Smith wrote: > Hello, > > I'm using: Ruby 1.8.6, Rails 1.2.3, ferret 0.11.4, acts_as_ferret from > svn stable. [..] > acts_as_ferret({:fields => {:name => {:boost => 10}, > :type => {:boost => 2}, > :email => {:boost => 10}, > :bio => {:store => :no}, > :status_id => {:boost => 1}}, > :store_class_name => true, > :remote => true, > :ferret => { :analyzer => > Ferret::Analysis::StandardAnalyzer.new([]) } > } ) > > With the StandardAnalyzer added, I do find results with "no" or "the". > The complicating factor is that as you can see, I have a field > "status_id". This field lets me filter for profiles that are > published or draft in my CMS. > [..] > In fact, OR becomes the default search regardless of whether I use a > field in the query: [..] > Again, before StandardAnalyzer, "AND" was the default so the first > "smith jones" query would have returned 0 as it should. > > Any insight as to why this might be? I would prefer AND to be the default. Then you shouldn't override acts_as_ferret's default behaviour by using the completely unsupported and only internally used :ferret option :-) I admit that this is a bug in how aaf handles it's parameters and I'll fix this, however for thetime being you can use this statement which should work as intended: acts_as_ferret({ :fields => {:name => {:boost => 10}, :type => {:boost => 2}, :email => {:boost => 10}, :bio => {:store => :no}, :status_id => {:boost => 1}}, :store_class_name => true, :remote => true }, { :analyzer => Ferret::Analysis::StandardAnalyzer.new([]) }) Please note the difference: the analyzer option is part of a second options hash. The reason for this separation is that AAF more or less passes the last hash directly to Ferret, while the first option hash is used for aaf options Ferret itself doesn't know about. However I plan to rework this in the Future so then your original statement should work correctly then. Btw, where did you find that solution? I've never seen the :ferret option being used outside aaf before. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Tue May 8 05:51:42 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 8 May 2007 11:51:42 +0200 Subject: [Ferret-talk] Stop words, fields, StandardAnalyzer quagmire In-Reply-To: <20070508093548.GG25267@cordoba.webit.de> References: <42d8808f0705041750lf19dd10oec54dbd8f12d0d3d@mail.gmail.com> <20070508093548.GG25267@cordoba.webit.de> Message-ID: <20070508095142.GH25267@cordoba.webit.de> On Tue, May 08, 2007 at 11:35:48AM +0200, Jens Kraemer wrote: [..] > > acts_as_ferret({:fields => {:name => {:boost => 10}, > > :type => {:boost => 2}, > > :email => {:boost => 10}, > > :bio => {:store => :no}, > > :status_id => {:boost => 1}}, > > :store_class_name => true, > > :remote => true, > > :ferret => { :analyzer => > > Ferret::Analysis::StandardAnalyzer.new([]) } > > } ) > > I just committed a fix so that the above call should be working correctly now. I'd go so far to say that this should be the preferred way of passing ferret options to aaf now. The two-hash calling style I suggested below will still work of course, so nothing should break. Thoughts anyone? Old calling style: > > acts_as_ferret({ :fields => {:name => {:boost => 10}, > :type => {:boost => 2}, > :email => {:boost => 10}, > :bio => {:store => :no}, > :status_id => {:boost => 1}}, > :store_class_name => true, > :remote => true > }, { > :analyzer => Ferret::Analysis::StandardAnalyzer.new([]) > }) > > Please note the difference: the analyzer option is part of a second > options hash. > -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From antti at akonniemi.fi Tue May 8 07:20:08 2007 From: antti at akonniemi.fi (Antti Akonniemi) Date: Tue, 8 May 2007 13:20:08 +0200 Subject: [Ferret-talk] Limit + Conditions = Confusion In-Reply-To: <20070508075424.GD25267@cordoba.webit.de> References: <20070508075424.GD25267@cordoba.webit.de> Message-ID: <8ac4af95c0071bfd4cd1a7799a3f7e51@ruby-forum.com> Jens Kraemer wrote: > On Tue, May 08, 2007 at 05:02:40AM +0200, Antti Akonniemi wrote: >> >> So is limit actually limiting the results before conditions? > > yes, the limit option you're using (in the first options hash) is passed > to ferret while AR conditions are applied after running the > ferret search. > > To get exact results you should use :limit => :all in the first hash and > put the 'real' limit into the AR options hash. Got it working now! Thank you so much! -antti -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue May 8 07:52:46 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 8 May 2007 13:52:46 +0200 Subject: [Ferret-talk] Rewarding exact matches In-Reply-To: <42149D18-5DEF-47E2-85DA-E1EF669106B2@housecafemusic.com> References: <20070502115003.GB4687@cordoba.webit.de> <42149D18-5DEF-47E2-85DA-E1EF669106B2@housecafemusic.com> Message-ID: <20070508115246.GJ25267@cordoba.webit.de> On Wed, May 02, 2007 at 02:17:19PM +0200, Steven Garcia wrote: > Setting the boost as you specified didn't work. > > How would I set up two queries? > > My code: http://pastie.caboo.se/58230 that should do the trick: class Search attr_accessor :query def initialize(query) @query = query end def do_search returning [] do |results| results << Term.multi_search("title:(#{query})", [Article, Term]) results << Term.multi_search("#{query} -title:(#{query})", [Article, Term]) end end end Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Tue May 8 08:16:27 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 8 May 2007 14:16:27 +0200 Subject: [Ferret-talk] Rewarding exact matches In-Reply-To: <20070508115246.GJ25267@cordoba.webit.de> References: <20070502115003.GB4687@cordoba.webit.de> <42149D18-5DEF-47E2-85DA-E1EF669106B2@housecafemusic.com> <20070508115246.GJ25267@cordoba.webit.de> Message-ID: <20070508121627.GK25267@cordoba.webit.de> For better compatibility with the code using your search model use aaf's SearchResults class: def do_search title_results = Term.multi_search( "title:(#{query})", [Article, Term] ) body_results = Term.multi_search( "#{query} -title:(#{query})", [Article, Term] ) new ActsAsFerret::SearchResults( title_results + body_results, title_results.total_hits + body_results.total_hits ) end Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From none at bbq.com Tue May 8 09:24:25 2007 From: none at bbq.com (mixplate) Date: Tue, 8 May 2007 15:24:25 +0200 Subject: [Ferret-talk] acts as ferret javascript.back fails after a search result Message-ID: <2f0fdc9a72c34b4fd2b1f59603293ce7@ruby-forum.com> hi, i use ferret in my application and when the user uses the search, i return some records. the user then clicks on a result to view details. on the details page, i have a javascript.history.back to return to the search result. however, i get the expired results page and the user has to refresh the browser. is there a simple way to solve this? thanks. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue May 8 09:35:30 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 8 May 2007 15:35:30 +0200 Subject: [Ferret-talk] acts as ferret javascript.back fails after a search result In-Reply-To: <2f0fdc9a72c34b4fd2b1f59603293ce7@ruby-forum.com> References: <2f0fdc9a72c34b4fd2b1f59603293ce7@ruby-forum.com> Message-ID: <20070508133530.GA9575@cordoba.webit.de> On Tue, May 08, 2007 at 03:24:25PM +0200, mixplate wrote: > hi, > i use ferret in my application and when the user uses the search, i > return some records. the user then clicks on a result to view details. > on the details page, i have a javascript.history.back to return to the > search result. however, i get the expired results page and the user has > to refresh the browser. is there a simple way to solve this? Do not use history.back in your link but use link_to to link to your search action? -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From doug.arogos at gmail.com Tue May 8 11:14:34 2007 From: doug.arogos at gmail.com (Doug Smith) Date: Tue, 8 May 2007 08:14:34 -0700 Subject: [Ferret-talk] Stop words, fields, StandardAnalyzer quagmire In-Reply-To: <20070508093548.GG25267@cordoba.webit.de> References: <42d8808f0705041750lf19dd10oec54dbd8f12d0d3d@mail.gmail.com> <20070508093548.GG25267@cordoba.webit.de> Message-ID: <42d8808f0705080814x9952a40nfa10a58b348b8578@mail.gmail.com> On 5/8/07, Jens Kraemer wrote: > Hi! > > However I plan to rework this in the Future so then your original statement > should work correctly then. Btw, where did you find that solution? I've > never seen the :ferret option being used outside aaf before. Hi Jens, Thank you for your fast response. I found this as an option by searching through the aaf source code. There was a commented out version of it in act_methods.rb, the acts_as_ferret() method. I'll try your latest change and let you know how it works. Thanks again, Doug From doug.arogos at gmail.com Tue May 8 11:54:01 2007 From: doug.arogos at gmail.com (Doug Smith) Date: Tue, 8 May 2007 08:54:01 -0700 Subject: [Ferret-talk] Stop words, fields, StandardAnalyzer quagmire In-Reply-To: <20070508095142.GH25267@cordoba.webit.de> References: <42d8808f0705041750lf19dd10oec54dbd8f12d0d3d@mail.gmail.com> <20070508093548.GG25267@cordoba.webit.de> <20070508095142.GH25267@cordoba.webit.de> Message-ID: <42d8808f0705080854u7606c2e0wc145ed2a92f9bdcc@mail.gmail.com> On 5/8/07, Jens Kraemer wrote: > On Tue, May 08, 2007 at 11:35:48AM +0200, Jens Kraemer wrote: > [..] > > > acts_as_ferret({:fields => {:name => {:boost => 10}, > > > :type => {:boost => 2}, > > > :email => {:boost => 10}, > > > :bio => {:store => :no}, > > > :status_id => {:boost => 1}}, > > > :store_class_name => true, > > > :remote => true, > > > :ferret => { :analyzer => > > > Ferret::Analysis::StandardAnalyzer.new([]) } > > > } ) > > > > > I just committed a fix so that the above call should be working > correctly now. I'd go so far to say that this should be the preferred > way of passing ferret options to aaf now. The two-hash calling style I > suggested below will still work of course, so nothing should break. Hi Jens, This is excellent. It works well in my initial testing. I think it's a great way to go. Thanks for your great support, Doug From robl at monkeyhelper.com Wed May 9 06:59:44 2007 From: robl at monkeyhelper.com (Rob Lee) Date: Wed, 9 May 2007 12:59:44 +0200 Subject: [Ferret-talk] more_like_this Message-ID: <4e4ccf2fac9797b28a8f2bfb855089e7@ruby-forum.com> Hi, I'm using acts_as_ferret in my rails application and I'd like to use more_like_this to retrieve some 'similar' item suggestions. I have a class 'items' which has a status field and I need to retrieve items that only have one of the two possible statuses. Looking at the more_like_this method indicates it supports an :append_to_query option that allows you to specify a proc that will modify the query object before the query is 'run'. This would seem to allow me to specify extra conditions to the query (such as +status:live). Item.more_like_this(:field_names => [:title, :description, :status], :append_to_query => Proc .... ) It's a little unclear exactly what the query object is and there seem to be no examples I can find outlining how to use this functionality, does anybody have an example they could contribute ? Thanks -- Posted via http://www.ruby-forum.com/. From senser.simon at gmail.com Wed May 9 07:26:30 2007 From: senser.simon at gmail.com (Jin) Date: Wed, 9 May 2007 13:26:30 +0200 Subject: [Ferret-talk] How did ferret sort by the column? Message-ID: <34a5fe9b2d8ce3b33b8b56a2eafea059@ruby-forum.com> Consider that if i have a column and the type is int or date then i pass the :sort field with it i must sort by the value but if a column which type is string and i set the :sort as it how did ferret sort by it? seems it is sort by the first letter of this column according to the result i read the source code but stopped in the c code,because i am not familiar with it and it is complex... so can anybody give me the answer? hope i have describe this question clearly I want to know the mechanism thank you -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed May 9 07:58:03 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 9 May 2007 13:58:03 +0200 Subject: [Ferret-talk] more_like_this In-Reply-To: <4e4ccf2fac9797b28a8f2bfb855089e7@ruby-forum.com> References: <4e4ccf2fac9797b28a8f2bfb855089e7@ruby-forum.com> Message-ID: <20070509115803.GJ9575@cordoba.webit.de> On Wed, May 09, 2007 at 12:59:44PM +0200, Rob Lee wrote: > Hi, > > I'm using acts_as_ferret in my rails application and I'd like to use > more_like_this to retrieve some 'similar' item suggestions. I have a > class 'items' which has a status field and I need to retrieve items that > only have one of the two possible statuses. > > Looking at the more_like_this method indicates it supports an > :append_to_query option that allows you to specify a proc that will > modify the query object before the query is 'run'. This would seem to > allow me to specify extra conditions to the query (such as > +status:live). > > Item.more_like_this(:field_names => [:title, :description, :status], > :append_to_query => Proc .... ) > > It's a little unclear exactly what the query object is and there seem to > be no examples I can find outlining how to use this functionality, does > anybody have an example they could contribute ? I don't have an exampla at hand, but maybe I can help anyway. The Proc parameter is a BooleanQuery instance. You can add your own conditions to this by adding your own Query to this: query.add_query(Ferret::Search::TermQuery.new(:status, 'live'), :must) Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From jhorneman at intelligent-artifice.com Wed May 9 15:47:00 2007 From: jhorneman at intelligent-artifice.com (Jurie Horneman) Date: Wed, 9 May 2007 21:47:00 +0200 Subject: [Ferret-talk] Constant 0.11.4 Errors In-Reply-To: References: <6afb2b2d0ca571fb1836ad044fe13854@ruby-forum.com> <5d2e7c85e61024937428a281650070c2@ruby-forum.com> <76685bc50704100857l2ca72916m9372284d803cbfcf@mail.gmail.com> <846f30c70704111157q3a56f15aw96f68da682e066bb@mail.gmail.com> <846f30c70704121510w8cbd21ai86565bf9fc1d7050@mail.gmail.com> <20070413071258.GM16943@cordoba.webit.de> <846f30c70704161918k6fb45324m57a263a7278c34b7@mail.gmail.com> <7486fad0801d800ec91fdf0c44e48e16@ruby-forum.com> <291875d8acfb198df852c08cfc8dec35@ruby-forum.com> Message-ID: <6cf58895a19f155076c68dfb54520de8@ruby-forum.com> Joey Geiger wrote: > Just wanted to add that I had a similar error. Just updated from 11.3 to > 11.4 and started getting the FileNotFound errors as stated above. > > The only change to the application was an update from 11.3 to 11.4. > > I have reverted back to 11.3 and it seems to be working again. Had the same problem, solved it the same way. One interesting thing perhaps: this problem only occurred when running entire test suites, never when running individual tests (in TextMate). -- Posted via http://www.ruby-forum.com/. From me at phillipoertel.com Wed May 9 17:59:59 2007 From: me at phillipoertel.com (Phillip Oertel) Date: Wed, 9 May 2007 23:59:59 +0200 Subject: [Ferret-talk] bug when assigning new analyzer? Message-ID: <01f86ed16fd2c08775af82d094131e8e@ruby-forum.com> require 'rubygems' require 'ferret' include Ferret PATH = '/tmp/ferret_stopwords_test' index = Index::IndexWriter.new(:path => PATH, :create => true) index.analyzer = Analysis::StandardAnalyzer.new([]) index << {:title => 'a few good men', :language => 'en'} index.analyzer = Analysis::StandardAnalyzer.new(['men']) index << {:title => 'a few good men', :language => 'nl'} index.close searcher = Index::Index.new(:path => PATH) puts searcher.search('*:men AND language:nl').total_hits #=> 1 i'd expect zero results, as 'men' is a stopword at the time of indexing with language:nl. is this a bug or a lack of understanding on my part. a workaround would be to close and reopen the index after every language, that returns the expected zero, as expected. don'T know how much overhead that would be. i am on ruby 1.8.5 / os x. any assistance would be greatly appreciated since i have no clue why this happens ... cheers, phillip -- Posted via http://www.ruby-forum.com/. From me at phillipoertel.com Wed May 9 18:04:38 2007 From: me at phillipoertel.com (Phillip Oertel) Date: Thu, 10 May 2007 00:04:38 +0200 Subject: [Ferret-talk] bug when assigning new analyzer? In-Reply-To: <01f86ed16fd2c08775af82d094131e8e@ruby-forum.com> References: <01f86ed16fd2c08775af82d094131e8e@ruby-forum.com> Message-ID: <9968535c42b35ce97d254b92edb366e1@ruby-forum.com> * addendum 1: i use ferret 0.11.4 * addendum 2: when i comment out the first index.analyzer assignment, i get: /Users/phillip/Sites/ruby/playground/ferret_stopwords.rb:13: [BUG] Bus Error ruby 1.8.5 (2006-12-25) [i686-darwin8.8.2] * addendum 3: the underlying problem i have is that i have many different languages that have to be correctly indexed. is there a best practise how to do that? i mean, better than having one index and switching the analyzer around? thanks again, phillip -- Posted via http://www.ruby-forum.com/. From vince71 at gmail.com Wed May 9 20:49:39 2007 From: vince71 at gmail.com (Vince W.) Date: Thu, 10 May 2007 02:49:39 +0200 Subject: [Ferret-talk] Constant 0.11.4 Errors In-Reply-To: References: <6afb2b2d0ca571fb1836ad044fe13854@ruby-forum.com> <5d2e7c85e61024937428a281650070c2@ruby-forum.com> <76685bc50704100857l2ca72916m9372284d803cbfcf@mail.gmail.com> <846f30c70704111157q3a56f15aw96f68da682e066bb@mail.gmail.com> <846f30c70704121510w8cbd21ai86565bf9fc1d7050@mail.gmail.com> <20070413071258.GM16943@cordoba.webit.de> <846f30c70704161918k6fb45324m57a263a7278c34b7@mail.gmail.com> <7486fad0801d800ec91fdf0c44e48e16@ruby-forum.com> <291875d8acfb198df852c08cfc8dec35@ruby-forum.com> Message-ID: <2dae7e518520d8c900e857c2b97b4e35@ruby-forum.com> Joey Geiger wrote: > Just wanted to add that I had a similar error. Just updated from 11.3 to > 11.4 and started getting the FileNotFound errors as stated above. > > The only change to the application was an update from 11.3 to 11.4. > > I have reverted back to 11.3 and it seems to be working again. Hmm.. not wanting to make a mistake here so allow me to ask a silly question. Do I just sudo gem install ferret, pick the 0.11.3 version, and then erase the 0.11.4 gem? -- Posted via http://www.ruby-forum.com/. From allenmacyoung at gmail.com Thu May 10 03:12:19 2007 From: allenmacyoung at gmail.com (Allen Young) Date: Thu, 10 May 2007 09:12:19 +0200 Subject: [Ferret-talk] Is there a way to do incremental search? Message-ID: <878f8e72959d125770be7130b5890bbb@ruby-forum.com> Say the user first enters "ruby" for search and gets 1000 results. Then he can search "rails" just in the 1000 results just returned. The common scenario is some kind of advanced search. User can incrementally add criteria and the program will narrow the results step by step. I know that at least I can use all the criteria as a whole to do the searching, but this is a waste and I'm hoping there is a better way. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu May 10 03:30:06 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 10 May 2007 09:30:06 +0200 Subject: [Ferret-talk] Is there a way to do incremental search? In-Reply-To: <878f8e72959d125770be7130b5890bbb@ruby-forum.com> References: <878f8e72959d125770be7130b5890bbb@ruby-forum.com> Message-ID: <20070510073006.GP9575@cordoba.webit.de> On Thu, May 10, 2007 at 09:12:19AM +0200, Allen Young wrote: > Say the user first enters "ruby" for search and gets 1000 results. Then > he can search "rails" just in the 1000 results just returned. > > The common scenario is some kind of advanced search. User can > incrementally add criteria and the program will narrow the results step > by step. > > I know that at least I can use all the criteria as a whole to do the > searching, but this is a waste and I'm hoping there is a better way. Given Ferret's speed this is no waste, just try it. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Thu May 10 03:34:05 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 10 May 2007 09:34:05 +0200 Subject: [Ferret-talk] bug when assigning new analyzer? In-Reply-To: <01f86ed16fd2c08775af82d094131e8e@ruby-forum.com> References: <01f86ed16fd2c08775af82d094131e8e@ruby-forum.com> Message-ID: <20070510073405.GQ9575@cordoba.webit.de> On Wed, May 09, 2007 at 11:59:59PM +0200, Phillip Oertel wrote: > require 'rubygems' > require 'ferret' > include Ferret > > PATH = '/tmp/ferret_stopwords_test' > > index = Index::IndexWriter.new(:path => PATH, :create => true) > > index.analyzer = Analysis::StandardAnalyzer.new([]) > index << {:title => 'a few good men', :language => 'en'} > > index.analyzer = Analysis::StandardAnalyzer.new(['men']) > index << {:title => 'a few good men', :language => 'nl'} > > index.close > > searcher = Index::Index.new(:path => PATH) > puts searcher.search('*:men AND language:nl').total_hits > #=> 1 > > i'd expect zero results, as 'men' is a stopword at the time of indexing > with language:nl. is this a bug or a lack of understanding on my part. Queries get analyzed, too, i.e. to remove stop words from them. So you'll have to use the correct language-dependent Analyzer for your searcher, too. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From allenmacyoung at gmail.com Thu May 10 04:05:13 2007 From: allenmacyoung at gmail.com (Allen Young) Date: Thu, 10 May 2007 10:05:13 +0200 Subject: [Ferret-talk] Is there a way to do incremental search? In-Reply-To: <20070510073006.GP9575@cordoba.webit.de> References: <878f8e72959d125770be7130b5890bbb@ruby-forum.com> <20070510073006.GP9575@cordoba.webit.de> Message-ID: <47080e22f82a818c4c4932521edc605a@ruby-forum.com> Jens Kraemer wrote: > Given Ferret's speed this is no waste, just try it. But what if I put a bunch of joins and conditions in find_options? Will it be still fast? I'm implementing search feature for material industry. You know that how many properties a material may have, or just think about a search according to the chemistry composition of the material, so many chemistry elements. Allen -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu May 10 04:24:35 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 10 May 2007 10:24:35 +0200 Subject: [Ferret-talk] Is there a way to do incremental search? In-Reply-To: <47080e22f82a818c4c4932521edc605a@ruby-forum.com> References: <878f8e72959d125770be7130b5890bbb@ruby-forum.com> <20070510073006.GP9575@cordoba.webit.de> <47080e22f82a818c4c4932521edc605a@ruby-forum.com> Message-ID: <20070510082435.GS9575@cordoba.webit.de> On Thu, May 10, 2007 at 10:05:13AM +0200, Allen Young wrote: > Jens Kraemer wrote: > > Given Ferret's speed this is no waste, just try it. > > > But what if I put a bunch of joins and conditions in find_options? Will > it be still fast? I'm implementing search feature for material industry. > You know that how many properties a material may have, or just think > about a search according to the chemistry composition of the material, > so many chemistry elements. You should try to use ferret instead of your DB as much as possible. Joins and conditions are applied after the ferret search to further narrow down the result set and, if speed is an issue, should only be used for things that really can't go into the index, i.e. checking for user permissions. It may be possible to gain some additional speed by using the results of a previous query as some kind of scope for the next one. I.e. you could keep the ids of your result set and use them as a base for your next query. However Ferret's API does not directly support incremental queries, you'd have to implement this yourself. Not impossible but imho you should only do this once you have 'real' data so you can measure what you gain by optimizing things. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From stian at grytoyr.net Thu May 10 05:24:10 2007 From: stian at grytoyr.net (=?ISO-8859-1?Q?Stian_Gryt=F8yr?=) Date: Thu, 10 May 2007 11:24:10 +0200 Subject: [Ferret-talk] Segmentation fault on large index Message-ID: <1adf8c200705100224u774195b3x475d29deda11532d@mail.gmail.com> I'm getting a segmentation fault on a large index (15GB). I'm running ferret 0.11.4 on OpenSuSE 10.2 with ruby 1.8.6. The segmentation fault appeared after I optimized the index, see further below for the error message I got before that. Ferret works perfectly on other (smaller) indexes. Is this a known issue, and if so, is there a workaround? --------------------- after optimizing the index ----------------------- $ irb irb(main):001:0> require 'rubygems' => true irb(main):002:0> require 'ferret' => true irb(main):003:0> index = Ferret::Index::Index.new(:path => "/tmp/myindex") => ##, :path=>"/tmp/myindex", :lock_retry_time=>2, :analyzer=>#, :default_field=>:*}, @mon_owner=nil, @auto_flush=false, @open=true, @dir=#, @id_field=:id, @searcher=nil, @mon_waiting_queue=[], @reader=nil, @key=nil, @close_dir=true> irb(main):004:0> index.search_each("*:foo") {|id, score| doc = index[id].load; puts doc.inspect} /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:411: [BUG] Segmentation fault ruby 1.8.6 (2007-03-13) [i686-linux] Aborted ---------------------- before optimizing the index --------------------- IOError (IO Error occured at :93 in xraise Error occured in fs_store.c:293 - fsi_seek_i seeking pos -1175113459: ): /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:411:in `[]' /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:411:in `[]' /usr/local/lib/ruby/1.8/monitor.rb:238:in `synchronize' /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:403:in `[]' /app/controllers/search_controller.rb:133:in `do_search' /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:385:in `search_each' /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:384:in `search_each' /usr/local/lib/ruby/1.8/monitor.rb:238:in `synchronize' /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:380:in `search_each' /app/controllers/search_controller.rb:131:in `do_search' /app/controllers/search_controller.rb:54:in `index' /usr/local/lib/ruby/1.8/benchmark.rb:293:in `measure' /app/controllers/search_controller.rb:53:in `index' /usr/local/lib/ruby/1.8/benchmark.rb:293:in `measure' /app/controllers/search_controller.rb:19:in `index' /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.3/lib/action_controller/base.rb:1095:in `send' /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.3/lib/action_controller/base.rb:1095:in `perform_action_without_filters' /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.3/lib/action_controller/filters.rb:632:in `call_filter' /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.3/lib/action_controller/filters.rb:619:in `perform_action_without_benchmark' /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.3/lib/action_controller/benchmarking.rb:66:in `perform_action_without_rescue' /usr/local/lib/ruby/1.8/benchmark.rb:293:in `measure' /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.3/lib/action_controller/benchmarking.rb:66:in `perform_action_without_rescue' /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.3/lib/action_controller/rescue.rb:83:in `perform_action' /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.3/lib/action_controller/base.rb:430:in `send' /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.3/lib/action_controller/base.rb:430:in `process_without_filters' /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.3/lib/action_controller/filters.rb:624:in `process_without_session_management_support' /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.3/lib/action_controller/session_management.rb:114:in `process' /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.3/lib/action_controller/base.rb:330:in `process' /usr/local/lib/ruby/gems/1.8/gems/rails-1.2.3/lib/dispatcher.rb:41:in `dispatch' /usr/local/lib/ruby/gems/1.8/gems/rails-1.2.3/lib/fcgi_handler.rb:168:in `process_request' /usr/local/lib/ruby/gems/1.8/gems/rails-1.2.3/lib/fcgi_handler.rb:143:in `process_each_request!' /usr/local/lib/ruby/gems/1.8/gems/rails-1.2.3/lib/fcgi_handler.rb:109:in `with_signal_handler' /usr/local/lib/ruby/gems/1.8/gems/rails-1.2.3/lib/fcgi_handler.rb:142:in `process_each_request!' /usr/local/lib/ruby/gems/1.8/gems/fcgi-0.8.7/lib/fcgi.rb:612:in `each_cgi' /usr/local/lib/ruby/gems/1.8/gems/fcgi-0.8.7/lib/fcgi.rb:609:in `each' /usr/local/lib/ruby/gems/1.8/gems/fcgi-0.8.7/lib/fcgi.rb:609:in `each_cgi' /usr/local/lib/ruby/gems/1.8/gems/rails-1.2.3/lib/fcgi_handler.rb:141:in `process_each_request!' /usr/local/lib/ruby/gems/1.8/gems/rails-1.2.3/lib/fcgi_handler.rb:55:in `process!' /usr/local/lib/ruby/gems/1.8/gems/rails-1.2.3/lib/fcgi_handler.rb:25:in `process!' /ma/www/virtual/ferret.marketaudit.no/Site/public/dispatch.fcgi:24 -- Best regards, Stian Gryt?yr From allenmacyoung at gmail.com Thu May 10 05:43:22 2007 From: allenmacyoung at gmail.com (Allen Young) Date: Thu, 10 May 2007 11:43:22 +0200 Subject: [Ferret-talk] Is there a way to do incremental search? In-Reply-To: <20070510082435.GS9575@cordoba.webit.de> References: <878f8e72959d125770be7130b5890bbb@ruby-forum.com> <20070510073006.GP9575@cordoba.webit.de> <47080e22f82a818c4c4932521edc605a@ruby-forum.com> <20070510082435.GS9575@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > You should try to use ferret instead of your DB as much as possible. > How can I use search a decimal type attribute using ferret? For example, I want all materials whose description contains "ruby" and resistance is between 0.1 and 0.5? I have a lot of decimal and integer type attributes which I don't know how to index with ferret. Any advice about this? allen -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu May 10 05:56:58 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 10 May 2007 11:56:58 +0200 Subject: [Ferret-talk] Is there a way to do incremental search? In-Reply-To: References: <878f8e72959d125770be7130b5890bbb@ruby-forum.com> <20070510073006.GP9575@cordoba.webit.de> <47080e22f82a818c4c4932521edc605a@ruby-forum.com> <20070510082435.GS9575@cordoba.webit.de> Message-ID: <20070510095658.GB4515@cordoba.webit.de> On Thu, May 10, 2007 at 11:43:22AM +0200, Allen Young wrote: > Jens Kraemer wrote: > > You should try to use ferret instead of your DB as much as possible. > > > How can I use search a decimal type attribute using ferret? For example, > I want all materials whose description contains "ruby" and resistance is > between 0.1 and 0.5? I have a lot of decimal and integer type attributes > which I don't know how to index with ferret. Any advice about this? Use untokenized fields for such values, and normalize them to a fixed length before indexing. I'd also normalize the decimals to integers. Regarding your incremental queries, the QueryFilter class might be useful: http://ferret.davebalmain.com/api/classes/Ferret/Search/QueryFilter.html Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From allenmacyoung at gmail.com Thu May 10 06:04:23 2007 From: allenmacyoung at gmail.com (Allen Young) Date: Thu, 10 May 2007 12:04:23 +0200 Subject: [Ferret-talk] Is there a way to do incremental search? In-Reply-To: <20070510095658.GB4515@cordoba.webit.de> References: <878f8e72959d125770be7130b5890bbb@ruby-forum.com> <20070510073006.GP9575@cordoba.webit.de> <47080e22f82a818c4c4932521edc605a@ruby-forum.com> <20070510082435.GS9575@cordoba.webit.de> <20070510095658.GB4515@cordoba.webit.de> Message-ID: <1b94cc89a6d8196c10956a8aa3234736@ruby-forum.com> Jens Kraemer wrote: > Use untokenized fields for such values, and normalize them to a fixed > length before indexing. I'd also normalize the decimals to integers. > And how should I construct the query string? "ruby 0.1 <= resistance <= 0.5"? > Regarding your incremental queries, the QueryFilter class might be > useful: > http://ferret.davebalmain.com/api/classes/Ferret/Search/QueryFilter.html I'll check this out. Thanks a lot. allen -- Posted via http://www.ruby-forum.com/. From allenmacyoung at gmail.com Thu May 10 06:21:51 2007 From: allenmacyoung at gmail.com (Allen Young) Date: Thu, 10 May 2007 12:21:51 +0200 Subject: [Ferret-talk] Is there a way to do incremental search? In-Reply-To: <1b94cc89a6d8196c10956a8aa3234736@ruby-forum.com> References: <878f8e72959d125770be7130b5890bbb@ruby-forum.com> <20070510073006.GP9575@cordoba.webit.de> <47080e22f82a818c4c4932521edc605a@ruby-forum.com> <20070510082435.GS9575@cordoba.webit.de> <20070510095658.GB4515@cordoba.webit.de> <1b94cc89a6d8196c10956a8aa3234736@ruby-forum.com> Message-ID: <42a6eee633e7b668bf3fa2dc253ee2df@ruby-forum.com> Allen Young wrote: > And how should I construct the query string? "ruby 0.1 <= resistance <= > 0.5"? Sorry, I've just realized this is a stupid question, I should use ":description(ruby) :resistance(>=0.1)". Another question. What if resistance attribute is not in the materials table but in some other table which has a one-to-one relationship with materials table? allen -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu May 10 06:30:31 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 10 May 2007 12:30:31 +0200 Subject: [Ferret-talk] Is there a way to do incremental search? In-Reply-To: <42a6eee633e7b668bf3fa2dc253ee2df@ruby-forum.com> References: <878f8e72959d125770be7130b5890bbb@ruby-forum.com> <20070510073006.GP9575@cordoba.webit.de> <47080e22f82a818c4c4932521edc605a@ruby-forum.com> <20070510082435.GS9575@cordoba.webit.de> <20070510095658.GB4515@cordoba.webit.de> <1b94cc89a6d8196c10956a8aa3234736@ruby-forum.com> <42a6eee633e7b668bf3fa2dc253ee2df@ruby-forum.com> Message-ID: <20070510103031.GD4515@cordoba.webit.de> On Thu, May 10, 2007 at 12:21:51PM +0200, Allen Young wrote: > Allen Young wrote: > > And how should I construct the query string? "ruby 0.1 <= resistance <= > > 0.5"? > Sorry, I've just realized this is a stupid question, I should use > ":description(ruby) :resistance(>=0.1)". you can also construct your query objects manually, in this case check out RangeQuery: http://ferret.davebalmain.com/api/classes/Ferret/Search/RangeQuery.html > Another question. What if resistance attribute is not in the materials > table but in some other table which has a one-to-one relationship with > materials table? define a method that retrieves the value and add the method's name as a field's name to your call to acts_as_ferret. You might want to search the list for 'indexed method' for an example of this. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From allenmacyoung at gmail.com Thu May 10 06:48:15 2007 From: allenmacyoung at gmail.com (Allen Young) Date: Thu, 10 May 2007 12:48:15 +0200 Subject: [Ferret-talk] Is there a way to do incremental search? In-Reply-To: <20070510103031.GD4515@cordoba.webit.de> References: <878f8e72959d125770be7130b5890bbb@ruby-forum.com> <20070510073006.GP9575@cordoba.webit.de> <47080e22f82a818c4c4932521edc605a@ruby-forum.com> <20070510082435.GS9575@cordoba.webit.de> <20070510095658.GB4515@cordoba.webit.de> <1b94cc89a6d8196c10956a8aa3234736@ruby-forum.com> <42a6eee633e7b668bf3fa2dc253ee2df@ruby-forum.com> <20070510103031.GD4515@cordoba.webit.de> Message-ID: <24845447586197dc05f211ba16d99ec8@ruby-forum.com> Jens Kraemer wrote: > On Thu, May 10, 2007 at 12:21:51PM +0200, Allen Young wrote: >> Allen Young wrote: >> Another question. What if resistance attribute is not in the materials >> table but in some other table which has a one-to-one relationship with >> materials table? > > define a method that retrieves the value and add the method's name as a > field's name to your call to acts_as_ferret. You might want to search > the list for 'indexed method' for an example of this. > There are about 100 attributes reside in several different tables. That means I need to define all this methods manually? -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu May 10 07:31:40 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 10 May 2007 13:31:40 +0200 Subject: [Ferret-talk] Is there a way to do incremental search? In-Reply-To: <24845447586197dc05f211ba16d99ec8@ruby-forum.com> References: <878f8e72959d125770be7130b5890bbb@ruby-forum.com> <20070510073006.GP9575@cordoba.webit.de> <47080e22f82a818c4c4932521edc605a@ruby-forum.com> <20070510082435.GS9575@cordoba.webit.de> <20070510095658.GB4515@cordoba.webit.de> <1b94cc89a6d8196c10956a8aa3234736@ruby-forum.com> <42a6eee633e7b668bf3fa2dc253ee2df@ruby-forum.com> <20070510103031.GD4515@cordoba.webit.de> <24845447586197dc05f211ba16d99ec8@ruby-forum.com> Message-ID: <20070510113140.GE4515@cordoba.webit.de> On Thu, May 10, 2007 at 12:48:15PM +0200, Allen Young wrote: > Jens Kraemer wrote: > > On Thu, May 10, 2007 at 12:21:51PM +0200, Allen Young wrote: > >> Allen Young wrote: > >> Another question. What if resistance attribute is not in the materials > >> table but in some other table which has a one-to-one relationship with > >> materials table? > > > > define a method that retrieves the value and add the method's name as a > > field's name to your call to acts_as_ferret. You might want to search > > the list for 'indexed method' for an example of this. > > > There are about 100 attributes reside in several different tables. That > means I need to define all this methods manually? that depends - some metaprogramming might help make it a less daunting task. i.e. class OtherClass # define which fields you want to have indexed and how: def ferret_fields { :field1 => { :store => :yes }, ... } end end class MyModel # collect field list for aaf ferret_fields = { :name => {}, ... } ferret_fields.update! OtherClass.ferret_fields acts_as_ferret :fields => ferret_fields # define getters OtherClass.ferret_fields.keys.each do |field| define_method :"ferret_#{field}" do other_object.send(field) end end end you can also join various (textual) attributes together and let them form a single field in the index where this is appropriate. Indexing data from a lot of relationships is not trivial when it comes to updating the index - whenever some record changes it's parent object(s) (that acts as the root and goes into the Ferret index first) has to be re-indexed. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From msdaly at gmail.com Thu May 10 14:48:10 2007 From: msdaly at gmail.com (Mike Daly) Date: Thu, 10 May 2007 20:48:10 +0200 Subject: [Ferret-talk] Large index performance = 8x decrease Message-ID: <8d068ccd730a0c4e967cacd8ca66ee6a@ruby-forum.com> hi, i'm indexing a really large db table (~4.2 million rows). i've noticed that after ~2M records, index performance decreases by almost an order of magnitude. full dataset graph here: http://i122.photobucket.com/albums/o244/spokeo/indexer-data.jpg here's a couple best-fit lines that represent the data points: 0-2M : y = 78.65655x + 144237.5 2.5M+ : y = 10.79832x + 1980630 the part that strikes me as most odd is the bend between 2M and 2.5M. i haven't read the ferret algorithm, but i would expect either linear or hyperbolic performance over time. however, the graph seems to indicate a particular breaking point after which performance is cut by 8x. is this behavior normal/expected? is there anything i could be doing to speed up an index of this size? (the index grows to ~12G while indexing, then gets shrunk to ~6G by the optimization) thanks for the help! -m --- MODEL CODE class MyModel < ActiveRecord::Base # think of body/title in terms of an average blog acts_as_ferret :fields => { 'body' => {}, 'title' => { :boost => 2 } } end --- INDEX CODE index = Ferret::Index::Index.new(MyModel.aaf_configuration[:ferret].dup.update(:auto_flush => false, :field_infos => MyModel.aaf_index.field_infos, :create => true)) n = 0 BATCH_SIZE = 1000 while true # new index from scratch records = MyModel.find(:all, :limit => BATCH_SIZE, :offset => n, :select => "id,#{MyModel.aaf_configuration[:ferret_fields].keys.join(',')}") break if (!records || records.length == 0) records.each do |record| index << record.to_doc # aaf method end n += BATCH_SIZE end index.flush index.optimize # 30+ minutes =( index.close --- CONFIG > gem list | grep ferret acts_as_ferret (0.4.0) ferret (0.11.3) > uname -a Linux gentoo 2.6.20-hardened #3 SMP Fri Mar 30 19:27:10 UTC 2007 x86_64 Intel(R) Pentium(R) D CPU 3.00GHz GenuineIntel GNU/Linux -- Posted via http://www.ruby-forum.com/. From sashthebash at gmail.com Thu May 10 16:38:42 2007 From: sashthebash at gmail.com (sashthebash) Date: Thu, 10 May 2007 22:38:42 +0200 Subject: [Ferret-talk] Limiting objects used in index (don't show private data) Message-ID: <087fa35c28a92c96e50783c13a77fcf4@ruby-forum.com> Hi, I am just starting to use aaf. I didn't find any information on this... my question is, if I can exclude some items from being indexed? I have events... and I have private events (in one table). The private events should not be indexed, otherwise everybody will be able to see them in the search. How can I prevent those from showing up? Thanks! -- Posted via http://www.ruby-forum.com/. From msdaly at gmail.com Thu May 10 20:00:46 2007 From: msdaly at gmail.com (Mike Daly) Date: Fri, 11 May 2007 02:00:46 +0200 Subject: [Ferret-talk] Limiting objects used in index (don't show private data) In-Reply-To: <087fa35c28a92c96e50783c13a77fcf4@ruby-forum.com> References: <087fa35c28a92c96e50783c13a77fcf4@ruby-forum.com> Message-ID: <938b0b6641b85a0ab211d01e721a1881@ruby-forum.com> http://www.ruby-forum.com/topic/100218 -- Posted via http://www.ruby-forum.com/. From tinaherrmann at gmail.com Fri May 11 18:05:39 2007 From: tinaherrmann at gmail.com (Tina) Date: Sat, 12 May 2007 00:05:39 +0200 Subject: [Ferret-talk] =?utf-8?q?Memory_leak_Windows_XP_SP2_related_to_sea?= =?utf-8?b?cmNoIGludm9sdmluZyAnw6Qn?= Message-ID: <815dc27deae7c18f984874ca1b7b5977@ruby-forum.com> Hi there - I have read through the posts here that seem related to this problem and tried all the suggested solutions. None of them seem to fix my problem: I am running Windows XP SP2, MySql 5.2, ruby-1.8.5, ferret-0.11.4 Right now I am still running everything in WEBrick I set up my system according to this: http://www.dockblog.de/2007/04/01/rubyrails-setup-in-unicode-utf8-with-ferret-search-mysql-under-debian-etch-with-apache2/ (as far as I could since I am not running Linux and Apache2) When doing a full_text_search that involves a word with an '?', the ruby.exe process goes nuts and keeps eating up memory until the system crashes (or I kill the process). Interestingly, this does not happen for any other Umlaut. I am guessing that the problem may have something to do with the locale of Windows. I tried to play around with the 'Regional and Language Options', but nothing seemed to help. I would be most grateful for any help or suggestions on this! Thanks, Tina -- Posted via http://www.ruby-forum.com/. From jeremy at hinegardner.org Sun May 13 23:48:24 2007 From: jeremy at hinegardner.org (Jeremy Hinegardner) Date: Sun, 13 May 2007 21:48:24 -0600 Subject: [Ferret-talk] Ferret Query Language as categorizer? In-Reply-To: <1adf8c200705100224u774195b3x475d29deda11532d@mail.gmail.com> References: <1adf8c200705100224u774195b3x475d29deda11532d@mail.gmail.com> Message-ID: <20070514034824.GC5494@hinegardner.org> Hi all, I'm looking at useing Ferret for categorizing documents. Essentially what I have are thousands of query rules that if a document matches, then it belongs to the category that is associated with that rule. Normally what we all do is have documents indexed and then run a query against the index to get back the documents that matche the query. What I want to do is the inverse. I have thousands of queries and I want to run all of them against one document at a time. The queries that match the document essentially categorize the document into the associated category. Yes, I am aware that this may not be the best way to approach a categorization problem, but it is a portion of how our current system works and I want to investigate ways to replace it and move on to better mechanism for categorization. I'm considering using our currenty query language and having it be a DSL to generates Ruby code. Esseintially my first whack at using Ferret for this was essentially the following : doc = File.read(OPTIONS.input_file) Ferret::I.new do |index| index << doc FasterCSV.foreach(OPTIONS.category_csv,{ :headers => headers }) do |row| next unless row[:boolean] top_docs = index.search(row[:boolean]) if top_docs.hits.size > 0 then puts "Matches : #{row[:name]}" end end end Short and sweet eh? Basically I'm looking for suggestions on better ways to means to have thousands of ferret queries (as FQL) run against a single document. Are there other approached that would be better? API calls that would do this more efficiently? Means to serialize FQL so that it doesn't have to be parsed? Thought, comments, rants, raves, brainstorms? enjoy, -jeremy -- ======================================================================== Jeremy Hinegardner jeremy at hinegardner.org From JanPrill at blauton.de Mon May 14 04:00:02 2007 From: JanPrill at blauton.de (Jan Prill) Date: Mon, 14 May 2007 08:00:02 +0000 Subject: [Ferret-talk] Ferret Query Language as categorizer? In-Reply-To: <20070514034824.GC5494@hinegardner.org> References: <1adf8c200705100224u774195b3x475d29deda11532d@mail.gmail.com> <20070514034824.GC5494@hinegardner.org> Message-ID: <562a35c10705140100r109fe0bauea8d3b785f908725@mail.gmail.com> Hi Jeremy, interesting approach. You might build your Query-Objects once by calling QueryParser#parse, serialize these Query-Objects ans use them. IMHO your Problem wouldn't be query parsing but the amount of queries that you are issueing on each document. On the other hand ferret is quite fast and it may work out if your process is not that time critical. Have you considered to combine queries. Ferrets Query Language is quite powerful and you might bring down the number of queries if you combine the queries that are useful to only one catogorization anyway. Check out the QueryParser API regarding this approach. At least the lines top_docs = index.search(row[:boolean]) if top_docs.hits.size > 0 then should read "if index.search(row[:boolean]).total_hits > 0" so that you don't need to read in the hits-array to get the size. As a last tip you might be interested in the underlying code of the more_like_this method of aaf to get the most used terms in your documents. This might be able to let your categorizations "learn" while documents get categorized. Cheers, Jan 2007/5/14, Jeremy Hinegardner : > > Hi all, > > I'm looking at useing Ferret for categorizing documents. > Essentially what I have are thousands of query rules that if a document > matches, then it belongs to the category that is associated with that > rule. Normally what we all do is have documents indexed and then run a > query against the index to get back the documents that matche the query. > > What I want to do is the inverse. I have thousands of queries and I > want to run all of them against one document at a time. The queries > that match the document essentially categorize the document into the > associated category. > > Yes, I am aware that this may not be the best way to approach a > categorization problem, but it is a portion of how our current system > works and I want to investigate ways to replace it and move on to better > mechanism for categorization. > > I'm considering using our currenty query language and having it be a DSL > to generates Ruby code. > > Esseintially my first whack at using Ferret for this was essentially the > following : > > doc = File.read(OPTIONS.input_file) > Ferret::I.new do |index| > index << doc > FasterCSV.foreach(OPTIONS.category_csv,{ :headers => headers }) do > |row| > next unless row[:boolean] > top_docs = index.search(row[:boolean]) > if top_docs.hits.size > 0 then > puts "Matches : #{row[:name]}" > end > end > end > > Short and sweet eh? Basically I'm looking for suggestions on better > ways to means to have thousands of ferret queries (as FQL) run against a > single document. Are there other approached that would be better? API > calls that would do this more efficiently? Means to serialize FQL so > that it doesn't have to be parsed? > > Thought, comments, rants, raves, brainstorms? > > enjoy, > > -jeremy > > -- > ======================================================================== > Jeremy Hinegardner jeremy at hinegardner.org > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jan Prill Rechtsanwalt Gr?nebergstra?e 38 22763 Hamburg Tel +49 (0)40 41265809 Fax +49 (0)40 380178-73022 Mobil +49 (0)171 3516667 http://www.inviado.de -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070514/454bead1/attachment.html From allenmacyoung at gmail.com Mon May 14 04:36:10 2007 From: allenmacyoung at gmail.com (Allen Young) Date: Mon, 14 May 2007 10:36:10 +0200 Subject: [Ferret-talk] Naming conflict between Ferret and my own model class Message-ID: <6cbcabb0a526a45c86dd2e7f31720aaa@ruby-forum.com> I model for maintaining searches in the database is call "Search" and Ferret also has a module called "Search". When I include Ferret in my "Search" model class, naming conflict came out and I got the following error: undefined method `new' for Ferret::Search:Module in the SearchesController, this is because I want to create an instance of my "Search" model class. I don't really want to rename my model class, is there any other solutions? Thanks for help. -- Posted via http://www.ruby-forum.com/. From alex at blackkettle.org Mon May 14 06:11:50 2007 From: alex at blackkettle.org (Alex Young) Date: Mon, 14 May 2007 11:11:50 +0100 Subject: [Ferret-talk] Ferret Query Language as categorizer? In-Reply-To: <20070514034824.GC5494@hinegardner.org> References: <1adf8c200705100224u774195b3x475d29deda11532d@mail.gmail.com> <20070514034824.GC5494@hinegardner.org> Message-ID: <464835E6.9010008@blackkettle.org> Jeremy Hinegardner wrote: > Hi all, > > I'm looking at useing Ferret for categorizing documents. > Essentially what I have are thousands of query rules that if a document > matches, then it belongs to the category that is associated with that > rule. Normally what we all do is have documents indexed and then run a > query against the index to get back the documents that matche the query. > > What I want to do is the inverse. I have thousands of queries and I > want to run all of them against one document at a time. The queries > that match the document essentially categorize the document into the > associated category. > Thought, comments, rants, raves, brainstorms? Random thought that might or might not work, depending on whether your queries are simple enough and how much data you want back: just invert the problem. Store the queries in Ferret, and treat your document as the query. Random example: irb(main):015:0> index = Index::Index.new irb(main):016:0> index << "hat" irb(main):017:0> index << "fox" irb(main):018:0> doc = "the quick brown fox jumped over the lazy dog" irb(main):018:0> index.search_each(doc) { |id, score| puts index[id].load.to_yaml + score.to_s } --- !map:Ferret::Index::LazyDoc :id: fox 0.0425622686743736 => 1 I've got absolutely no idea how well the query parser will handle larger documents, but it's worth a try... -- Alex From allenmacyoung at gmail.com Mon May 14 07:18:13 2007 From: allenmacyoung at gmail.com (Allen Young) Date: Mon, 14 May 2007 13:18:13 +0200 Subject: [Ferret-talk] Is there a way to do incremental search? In-Reply-To: <20070510113140.GE4515@cordoba.webit.de> References: <878f8e72959d125770be7130b5890bbb@ruby-forum.com> <20070510073006.GP9575@cordoba.webit.de> <47080e22f82a818c4c4932521edc605a@ruby-forum.com> <20070510082435.GS9575@cordoba.webit.de> <20070510095658.GB4515@cordoba.webit.de> <1b94cc89a6d8196c10956a8aa3234736@ruby-forum.com> <42a6eee633e7b668bf3fa2dc253ee2df@ruby-forum.com> <20070510103031.GD4515@cordoba.webit.de> <24845447586197dc05f211ba16d99ec8@ruby-forum.com> <20070510113140.GE4515@cordoba.webit.de> Message-ID: <49cdb5b33af5ab3787071040123c2743@ruby-forum.com> Jens Kraemer wrote: > On Thu, May 10, 2007 at 12:48:15PM +0200, Allen Young wrote: >> > >> There are about 100 attributes reside in several different tables. That >> means I need to define all this methods manually? > > that depends - some metaprogramming might help make it a less daunting > task. > > i.e. > > class OtherClass > # define which fields you want to have indexed and how: > def ferret_fields > { :field1 => { :store => :yes }, ... } > end > end > > class MyModel > # collect field list for aaf > ferret_fields = { :name => {}, ... } > ferret_fields.update! OtherClass.ferret_fields > > acts_as_ferret :fields => ferret_fields > > # define getters > OtherClass.ferret_fields.keys.each do |field| > define_method :"ferret_#{field}" do I think this should be :"#{field}" or I must define ferret_feilds = { :ferret_field1 } in OtherClass instead. > other_object.send(field) > end > end > end > It seems that define_method doesn't work at all. I got many errors saying that there is no method defined for things like ferret_field1. I even tried to define a simple method dynamically with the following code at the class level of my model class. define_method :say_hello { puts 'hello' } But unexpectedly, method_missing is called, so it means that define_method doesn't work. Is there anything wrong about my using with define_method? allen -- Posted via http://www.ruby-forum.com/. From allenmacyoung at gmail.com Mon May 14 07:42:26 2007 From: allenmacyoung at gmail.com (Allen Young) Date: Mon, 14 May 2007 13:42:26 +0200 Subject: [Ferret-talk] Is there a way to do incremental search? In-Reply-To: <49cdb5b33af5ab3787071040123c2743@ruby-forum.com> References: <878f8e72959d125770be7130b5890bbb@ruby-forum.com> <20070510073006.GP9575@cordoba.webit.de> <47080e22f82a818c4c4932521edc605a@ruby-forum.com> <20070510082435.GS9575@cordoba.webit.de> <20070510095658.GB4515@cordoba.webit.de> <1b94cc89a6d8196c10956a8aa3234736@ruby-forum.com> <42a6eee633e7b668bf3fa2dc253ee2df@ruby-forum.com> <20070510103031.GD4515@cordoba.webit.de> <24845447586197dc05f211ba16d99ec8@ruby-forum.com> <20070510113140.GE4515@cordoba.webit.de> <49cdb5b33af5ab3787071040123c2743@ruby-forum.com> Message-ID: <03a8cd60b40d7f2e7ccf0080b6c14fb7@ruby-forum.com> Allen Young wrote: > It seems that define_method doesn't work at all. I got many errors > saying that there is no method defined for things like ferret_field1. I > even tried to define a simple method dynamically with the following > code at the class level of my model class. > > define_method :say_hello { puts 'hello' } > > But unexpectedly, method_missing is called, so it means that > define_method doesn't work. Is there anything wrong about my using with > define_method? > Well, I got something. If I write define_method is this way: define_method(:say_hello) { puts 'hello' } then the "say_hello" method can be defined dynamically. Quite strange! But I still cannot figure out why OtherClass.ferret_fields.keys.each do |field| define_method :"ferret_#{field}" do other_object.send(field) end end doesn't work. Is this becouse define_method is in the context of OtherClass and thus dynamically adds all the method to OtherClass? allen -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon May 14 08:15:41 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 14 May 2007 14:15:41 +0200 Subject: [Ferret-talk] Is there a way to do incremental search? In-Reply-To: <03a8cd60b40d7f2e7ccf0080b6c14fb7@ruby-forum.com> References: <20070510082435.GS9575@cordoba.webit.de> <20070510095658.GB4515@cordoba.webit.de> <1b94cc89a6d8196c10956a8aa3234736@ruby-forum.com> <42a6eee633e7b668bf3fa2dc253ee2df@ruby-forum.com> <20070510103031.GD4515@cordoba.webit.de> <24845447586197dc05f211ba16d99ec8@ruby-forum.com> <20070510113140.GE4515@cordoba.webit.de> <49cdb5b33af5ab3787071040123c2743@ruby-forum.com> <03a8cd60b40d7f2e7ccf0080b6c14fb7@ruby-forum.com> Message-ID: <20070514121541.GE27534@cordoba.webit.de> On Mon, May 14, 2007 at 01:42:26PM +0200, Allen Young wrote: > Allen Young wrote: > > It seems that define_method doesn't work at all. I got many errors > > saying that there is no method defined for things like ferret_field1. I > > even tried to define a simple method dynamically with the following > > code at the class level of my model class. > > > > define_method :say_hello { puts 'hello' } > > > > But unexpectedly, method_missing is called, so it means that > > define_method doesn't work. Is there anything wrong about my using with > > define_method? > > > Well, I got something. If I write define_method is this way: > > define_method(:say_hello) { puts 'hello' } syntax weirdness ... > then the "say_hello" method can be defined dynamically. Quite strange! > But I still cannot figure out why > > OtherClass.ferret_fields.keys.each do |field| > define_method :"ferret_#{field}" do > other_object.send(field) > end > end > > doesn't work. Is this becouse define_method is in the context of > OtherClass and thus dynamically adds all the method to OtherClass? no, but you could try to add the parens there as well: define_method(:"ferret_#{field}") do ... Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From JanPrill at blauton.de Mon May 14 08:30:03 2007 From: JanPrill at blauton.de (Jan Prill) Date: Mon, 14 May 2007 12:30:03 +0000 Subject: [Ferret-talk] Ferret as an object database Message-ID: <562a35c10705140530s72296bb1uc017b386159a18bf@mail.gmail.com> Hi List and especially Dave, I just happened to remember a discussion on the mailing list, that you (Dave) are thinking and maybe even working on Ferret becoming kind of an object database that fully circumvents the SQL-Store on occasions where this might be appropriate (http://www.ruby-forum.com/topic/82086#142613). I'm using ferret quite heavily at the moment for a private project of mine and would love to put away the sql store for my document management approach. Storing huge amounts of data in the SQL-Layer as well as in the ferret-index simply doesn't feel right (not to talk of staying dry). So I wonder if you are making any progress on this and maybe even encourage you to put out a call for donations on this specific issue. Unfortunately I won't be able to donate much myself but I'm pretty sure that there is demand for something like this especially in my knowledge domain, the knowledge management of documents in the context of legal documents (jurisdiction, laws, legal science). Thanks for Ferret!!! Cheers, Jan Prill -- Jan Prill Rechtsanwalt Gr?nebergstra?e 38 22763 Hamburg Tel +49 (0)40 41265809 Fax +49 (0)40 380178-73022 Mobil +49 (0)171 3516667 http://www.inviado.de -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070514/1d95cc9b/attachment-0001.html From hakita at gmail.com Mon May 14 11:17:04 2007 From: hakita at gmail.com (py landanger) Date: Mon, 14 May 2007 17:17:04 +0200 Subject: [Ferret-talk] How to make a Tag cloud with Ferret ? Message-ID: <133fb1092bb336168950799ee4d71399@ruby-forum.com> Hello, I want to make a TAG CLOUD using ferret. How can i do so ? I would need to know the amount of keyword for every each words in the index. Thank you -- Posted via http://www.ruby-forum.com/. From john at johnleach.co.uk Mon May 14 11:24:54 2007 From: john at johnleach.co.uk (John Leach) Date: Mon, 14 May 2007 16:24:54 +0100 Subject: [Ferret-talk] How to make a Tag cloud with Ferret ? In-Reply-To: <133fb1092bb336168950799ee4d71399@ruby-forum.com> References: <133fb1092bb336168950799ee4d71399@ruby-forum.com> Message-ID: <1179156294.3867.21.camel@localhost.localdomain> Hello, See TermEnum documentation: http://ferret.davebalmain.com/api/classes/Ferret/Index/TermEnum.html John. On Mon, 2007-05-14 at 17:17 +0200, py landanger wrote: > Hello, > > I want to make a TAG CLOUD using ferret. > How can i do so ? > > I would need to know the amount of keyword for every each words in the > index. > > > Thank you > -- http://johnleach.co.uk From dburkes at netable.com Mon May 14 13:53:31 2007 From: dburkes at netable.com (Danny Burkes) Date: Mon, 14 May 2007 19:53:31 +0200 Subject: [Ferret-talk] A Ferret/AAF success story Message-ID: <4807400b8c3b7b00f439b892d5bc6450@ruby-forum.com> Hello everyone- We recently added full archives search to Lingr (http://www.lingr.com), and we used Ferret/AAF to do it. I've written a blog post with some details of that integration, and I thought some of you might be interested. See http://blog.lingr.com/2007/05/we_heart_ferret.html. I'm grateful to the authors of Ferret and AAF, as well as to all the people in this forum who helped me stumble through the integration. I hope I can give something back here to even things up :-) Best Regards, Danny -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon May 14 14:26:32 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 14 May 2007 20:26:32 +0200 Subject: [Ferret-talk] A Ferret/AAF success story In-Reply-To: <4807400b8c3b7b00f439b892d5bc6450@ruby-forum.com> References: <4807400b8c3b7b00f439b892d5bc6450@ruby-forum.com> Message-ID: <20070514182632.GH27534@cordoba.webit.de> On Mon, May 14, 2007 at 07:53:31PM +0200, Danny Burkes wrote: > Hello everyone- > > We recently added full archives search to Lingr (http://www.lingr.com), > and we used Ferret/AAF to do it. cool :-) I really like the result clustering. Small issue: The statement 'Displaying the top 100' was confusing at first since only 7 or so rooms were displayed. Maybe additionally mentioning the number of rooms could help to clear things up a bit? I.e.: 'Displaying 100 messages from 7 rooms' > I've written a blog post with some details of that integration, and I > thought some of you might be interested. See > http://blog.lingr.com/2007/05/we_heart_ferret.html. > > I'm grateful to the authors of Ferret and AAF, as well as to all the > people in this forum who helped me stumble through the integration. I > hope I can give something back here to even things up :-) I think that multi-lingual tokenizer could be useful for other users of Ferret, too ;-) cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From dburkes at netable.com Mon May 14 14:47:01 2007 From: dburkes at netable.com (Danny Burkes) Date: Mon, 14 May 2007 20:47:01 +0200 Subject: [Ferret-talk] A Ferret/AAF success story In-Reply-To: <20070514182632.GH27534@cordoba.webit.de> References: <4807400b8c3b7b00f439b892d5bc6450@ruby-forum.com> <20070514182632.GH27534@cordoba.webit.de> Message-ID: <95a8e0ab09dd1f5f29b5b544cc40435c@ruby-forum.com> > Small issue: The statement 'Displaying the top 100' was confusing at > first since only 7 or so rooms were displayed. Maybe additionally > mentioning the number of rooms could help to clear things up a bit? > I.e.: 'Displaying 100 messages from 7 rooms' > Excellent suggestion- I'll implement it now :-) > I think that multi-lingual tokenizer could be useful for other users of > Ferret, too ;-) > We'll definitely open-source that - just let me clean it up a bit, then we'll post it at http://svn.lingr.com. I'll make an announcement here when it's ready. Best Regards, Danny -- Posted via http://www.ruby-forum.com/. From jeremy at hinegardner.org Mon May 14 14:55:57 2007 From: jeremy at hinegardner.org (Jeremy Hinegardner) Date: Mon, 14 May 2007 12:55:57 -0600 Subject: [Ferret-talk] Ferret Query Language as categorizer? In-Reply-To: <464835E6.9010008@blackkettle.org> References: <1adf8c200705100224u774195b3x475d29deda11532d@mail.gmail.com> <20070514034824.GC5494@hinegardner.org> <464835E6.9010008@blackkettle.org> Message-ID: <20070514185557.GI5494@hinegardner.org> On Mon, May 14, 2007 at 11:11:50AM +0100, Alex Young wrote: > Jeremy Hinegardner wrote: > >Hi all, > > > >I'm looking at useing Ferret for categorizing documents. > >Essentially what I have are thousands of query rules that if a document > >matches, then it belongs to the category that is associated with that > >rule. Normally what we all do is have documents indexed and then run a > >query against the index to get back the documents that matche the query. > > > >What I want to do is the inverse. I have thousands of queries and I > >want to run all of them against one document at a time. The queries > >that match the document essentially categorize the document into the > >associated category. > > >Thought, comments, rants, raves, brainstorms? > Random thought that might or might not work, depending on whether your > queries are simple enough and how much data you want back: just invert > the problem. Store the queries in Ferret, and treat your document as > the query. Random example: > > irb(main):015:0> index = Index::Index.new > irb(main):016:0> index << "hat" > irb(main):017:0> index << "fox" > irb(main):018:0> doc = "the quick brown fox jumped over the lazy dog" > irb(main):018:0> index.search_each(doc) { |id, score| puts > index[id].load.to_yaml + score.to_s } > --- !map:Ferret::Index::LazyDoc > :id: fox > 0.0425622686743736 > => 1 > > I've got absolutely no idea how well the query parser will handle larger > documents, but it's worth a try... I did give some thought to this, but we have some fairly complex categorization queries, some of which are the equivalent of SpanTermQuery. Since there is no FQL for those type of queries yet, I don't think your approach will work for me. But it is a good idea. enjoy, -jeremy -- ======================================================================== Jeremy Hinegardner jeremy at hinegardner.org From jeremy at hinegardner.org Mon May 14 15:01:03 2007 From: jeremy at hinegardner.org (Jeremy Hinegardner) Date: Mon, 14 May 2007 13:01:03 -0600 Subject: [Ferret-talk] Ferret Query Language as categorizer? In-Reply-To: <562a35c10705140100r109fe0bauea8d3b785f908725@mail.gmail.com> References: <1adf8c200705100224u774195b3x475d29deda11532d@mail.gmail.com> <20070514034824.GC5494@hinegardner.org> <562a35c10705140100r109fe0bauea8d3b785f908725@mail.gmail.com> Message-ID: <20070514190103.GJ5494@hinegardner.org> On Mon, May 14, 2007 at 08:00:02AM +0000, Jan Prill wrote: > interesting approach. You might build your Query-Objects once by calling > QueryParser#parse, serialize these Query-Objects ans use them. Yup, that's one item I need to look into. One of hte issues is the query language we're using right now has 'NEAR' keywords so we'll need to convert those into SpanTermQuery's, I'm thinking to have the DSL generate ruby code, then serialize those Query objects, or maybe just run them as code. > IMHO your Problem wouldn't be query parsing but the amount of queries that > you are issueing on each document. On the other hand ferret is quite fast > and it may work out if your process is not that time critical. Have you > considered to combine queries. Ferrets Query Language is quite powerful and > you might bring down the number of queries if you combine the queries that > are useful to only one catogorization anyway. Check out the QueryParser API > regarding this approach. I will investigate the API more, currently we don't have multiple queries that equate to a single category, its a one-to-one relationship between category and query. The speed of my initial experiments is within our tolerances, but may not be good for a serial execution. Of course, since all of this is in a single Memory index, per document, it could be parallellized. > At least the lines > > top_docs = index.search(row[:boolean]) > if top_docs.hits.size > 0 then > > should read "if index.search(row[:boolean]).total_hits > 0" so that you > don't need to read in the hits-array to get the size. Good tip, thanks. > As a last tip you might be interested in the underlying code of the > more_like_this method of aaf to get the most used terms in your documents. > This might be able to let your categorizations "learn" while documents get > categorized. I will definitely check more into that. Who knows, maybe a categorization engine based on Feret will fall out of this :-) enjoy, -jeremy -- ======================================================================== Jeremy Hinegardner jeremy at hinegardner.org From j_d_robbins at yahoo.com Tue May 15 11:37:19 2007 From: j_d_robbins at yahoo.com (Jacob Robbins) Date: Tue, 15 May 2007 17:37:19 +0200 Subject: [Ferret-talk] more_like_this In-Reply-To: <4e4ccf2fac9797b28a8f2bfb855089e7@ruby-forum.com> References: <4e4ccf2fac9797b28a8f2bfb855089e7@ruby-forum.com> Message-ID: <8bf9a47da8d09f2a6ee7f65d59ac3ba6@ruby-forum.com> Rob Lee wrote: > > Item.more_like_this(:field_names => [:title, :description, :status], > :append_to_query => Proc .... ) > I don't mean to be nitpicky but more_like_this is an instance method not a class method. This has come up for me because more_like_this does not work for unsaved records in the current AAF which doesn't mesh with the rails convention of creating a new active record object to store user query params. I'd like to make a regular rails form using a blank object and then call more_like_this on that object to do a search. -- Posted via http://www.ruby-forum.com/. From levent at leventali.com Tue May 15 11:39:41 2007 From: levent at leventali.com (Levent Ali) Date: Tue, 15 May 2007 16:39:41 +0100 Subject: [Ferret-talk] how to index and search multiple foreign keys in aaf Message-ID: <76685bc50705150839k28bffb7bq9e71b7f7626a4228@mail.gmail.com> Let's say I have a class Job that has a series of possible locations that are stored as a sequence of foreign keys class Job acts_as_ferret { :locations => {} } def locations "1,25,23,15" # ?? end end Is that a good way to store the location ids? How would I search for Jobs in a certain location Job.find_by_contents('locations(25)') cheers From steven at housecafemusic.com Tue May 15 12:01:07 2007 From: steven at housecafemusic.com (Steven Garcia) Date: Tue, 15 May 2007 18:01:07 +0200 Subject: [Ferret-talk] AAF quirks in production mode Message-ID: <4555524C-91A6-4476-8AD9-121457BC571F@housecafemusic.com> So my ferret behaves nicely in dev mode, but I just deployed and now he is not happy! First thing I noticed was that in general my app would not start with a folder called "development" in my index folder, so I changed it to "production" and now my app is functioning. However, when I try to search I get errors, which you can see here: http://pastie.caboo.se/61767 Im no guru, but I would guess that my little stunt with renaming the directory wasn't kosher enough for AAF What should I do? From steven at housecafemusic.com Tue May 15 12:15:45 2007 From: steven at housecafemusic.com (Steven Garcia) Date: Tue, 15 May 2007 18:15:45 +0200 Subject: [Ferret-talk] AAF quirks in production mode In-Reply-To: <4555524C-91A6-4476-8AD9-121457BC571F@housecafemusic.com> References: <4555524C-91A6-4476-8AD9-121457BC571F@housecafemusic.com> Message-ID: Just found another quirk.. anytime i save or update something i get an error http://pastie.caboo.se/61773 I will assume that this all ties into my tomfoolery with directories From kraemer at webit.de Tue May 15 13:03:05 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 15 May 2007 19:03:05 +0200 Subject: [Ferret-talk] AAF quirks in production mode In-Reply-To: <4555524C-91A6-4476-8AD9-121457BC571F@housecafemusic.com> References: <4555524C-91A6-4476-8AD9-121457BC571F@housecafemusic.com> Message-ID: <20070515170305.GA8905@cordoba.webit.de> On Tue, May 15, 2007 at 06:01:07PM +0200, Steven Garcia wrote: > So my ferret behaves nicely in dev mode, but I just deployed and now > he is not happy! > > First thing I noticed was that in general my app would not start with > a folder called "development" in my index folder, so I changed it to > "production" and now my app is functioning. what was the error before you renamed that directory? In general, AAF does not rely on you to create the production directory, it can do this on its own given proper access rights - most of the time it's just that the index directory cannot be accessed by Ferret because of missing write permissions to the index dir. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Tue May 15 13:05:12 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 15 May 2007 19:05:12 +0200 Subject: [Ferret-talk] how to index and search multiple foreign keys in aaf In-Reply-To: <76685bc50705150839k28bffb7bq9e71b7f7626a4228@mail.gmail.com> References: <76685bc50705150839k28bffb7bq9e71b7f7626a4228@mail.gmail.com> Message-ID: <20070515170512.GB8905@cordoba.webit.de> On Tue, May 15, 2007 at 04:39:41PM +0100, Levent Ali wrote: > Let's say I have a class Job that has a series of possible locations > that are stored as a sequence of foreign keys > > class Job > acts_as_ferret { > :locations => {} > } > > def locations > "1,25,23,15" > # ?? > end > end > > Is that a good way to store the location ids? > > How would I search for Jobs in a certain location > > Job.find_by_contents('locations(25)') I'd say 'locations:25' but besides that this looks like it could work. To be on the safe side do not separate the ids with ',' but with ' '. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From steven at housecafemusic.com Tue May 15 13:31:04 2007 From: steven at housecafemusic.com (Steven Garcia) Date: Tue, 15 May 2007 19:31:04 +0200 Subject: [Ferret-talk] AAF quirks in production mode In-Reply-To: <20070515170305.GA8905@cordoba.webit.de> References: <4555524C-91A6-4476-8AD9-121457BC571F@housecafemusic.com> <20070515170305.GA8905@cordoba.webit.de> Message-ID: Actually it was the same error ArgumentError ( isn't a valid directory argument. You should use either a String or a Directory): my index folder is chmoded to 755... is that correct? From kraemer at webit.de Tue May 15 13:35:44 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 15 May 2007 19:35:44 +0200 Subject: [Ferret-talk] AAF quirks in production mode In-Reply-To: References: <4555524C-91A6-4476-8AD9-121457BC571F@housecafemusic.com> <20070515170305.GA8905@cordoba.webit.de> Message-ID: <20070515173544.GA9379@cordoba.webit.de> On Tue, May 15, 2007 at 07:31:04PM +0200, Steven Garcia wrote: > Actually it was the same error > > ArgumentError ( isn't a valid directory argument. You should use > either a String or a Directory): > > my index folder is chmoded to 755... is that correct? that depends, with 755 only the owner of the directory may write to it. try chmod -R 777 index/ to check if the problem is with access rights. If it still doesn't work then you can revert to the original access rights. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From steven at housecafemusic.com Tue May 15 13:46:26 2007 From: steven at housecafemusic.com (Steven Garcia) Date: Tue, 15 May 2007 19:46:26 +0200 Subject: [Ferret-talk] AAF quirks in production mode In-Reply-To: <20070515173544.GA9379@cordoba.webit.de> References: <4555524C-91A6-4476-8AD9-121457BC571F@housecafemusic.com> <20070515170305.GA8905@cordoba.webit.de> <20070515173544.GA9379@cordoba.webit.de> Message-ID: <91B130A1-0DF7-4AA2-ABED-BDD2E694B87A@housecafemusic.com> That did the trick! I was worried about security risks, but if you say 777 is okay then I will run with it From seanmichaelbrown at gmail.com Tue May 15 13:49:47 2007 From: seanmichaelbrown at gmail.com (Sean Brown) Date: Tue, 15 May 2007 13:49:47 -0400 Subject: [Ferret-talk] AAF quirks in production mode In-Reply-To: <91B130A1-0DF7-4AA2-ABED-BDD2E694B87A@housecafemusic.com> References: <4555524C-91A6-4476-8AD9-121457BC571F@housecafemusic.com> <20070515170305.GA8905@cordoba.webit.de> <20070515173544.GA9379@cordoba.webit.de> <91B130A1-0DF7-4AA2-ABED-BDD2E694B87A@housecafemusic.com> Message-ID: <1086fb5f0705151049u1334004dmb3b9c35406724a7d@mail.gmail.com> On 5/15/07, Steven Garcia wrote: > That did the trick! > > I was worried about security risks, but if you say 777 is okay then I > will run with it No, no,no. 777 is not OK. That will open your site to Very Bad Things (tm). Jens was saying that the user running your app needs to have write access. You should figure out what user is running your app (mongrel?, lighty?, etc.) and make sure to give that user (only) write access. -- Sean Brown From kraemer at webit.de Tue May 15 13:58:42 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 15 May 2007 19:58:42 +0200 Subject: [Ferret-talk] more_like_this In-Reply-To: <8bf9a47da8d09f2a6ee7f65d59ac3ba6@ruby-forum.com> References: <4e4ccf2fac9797b28a8f2bfb855089e7@ruby-forum.com> <8bf9a47da8d09f2a6ee7f65d59ac3ba6@ruby-forum.com> Message-ID: <20070515175842.GB9379@cordoba.webit.de> On Tue, May 15, 2007 at 05:37:19PM +0200, Jacob Robbins wrote: > Rob Lee wrote: > > > > > Item.more_like_this(:field_names => [:title, :description, :status], > > :append_to_query => Proc .... ) > > > > I don't mean to be nitpicky but more_like_this is an instance method not > a class method. This has come up for me because more_like_this does not > work for unsaved records in the current AAF which doesn't mesh with the > rails convention of creating a new active record object to store user > query params. I'd like to make a regular rails form using a blank object > and then call more_like_this on that object to do a search. This isn't supported by aaf but should be possible to do with a bit of hacking :) It'll get a bit harder if you want to do this with the DRb server, since then you'll have to transfer your unsaved record over to the server for the more_like_this query to be built. Atm only id and class name are transferred with method calls. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Tue May 15 14:04:37 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 15 May 2007 20:04:37 +0200 Subject: [Ferret-talk] AAF quirks in production mode In-Reply-To: <1086fb5f0705151049u1334004dmb3b9c35406724a7d@mail.gmail.com> References: <4555524C-91A6-4476-8AD9-121457BC571F@housecafemusic.com> <20070515170305.GA8905@cordoba.webit.de> <20070515173544.GA9379@cordoba.webit.de> <91B130A1-0DF7-4AA2-ABED-BDD2E694B87A@housecafemusic.com> <1086fb5f0705151049u1334004dmb3b9c35406724a7d@mail.gmail.com> Message-ID: <20070515180437.GC9379@cordoba.webit.de> On Tue, May 15, 2007 at 01:49:47PM -0400, Sean Brown wrote: > On 5/15/07, Steven Garcia wrote: > > That did the trick! > > > > I was worried about security risks, but if you say 777 is okay then I > > will run with it > > No, no,no. 777 is not OK. That will open your site to Very Bad > Things (tm). Jens was saying that the user running your app needs to > have write access. You should figure out what user is running your > app (mongrel?, lighty?, etc.) and make sure to give that user (only) > write access. Full ACK, the 777 was only the short way of checking if it is a permissions problem in the first place. In addition to what Sean wrote above - if you're running the DRb server it's this process that needs write access, not your mongrels (as they will only talk to the DRb server). Won't matter if you use the same user for mongrels and DRb, though. So once you know the user, give him write permissions and revert the permissions to something more restrictive again. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From steven at housecafemusic.com Tue May 15 14:19:59 2007 From: steven at housecafemusic.com (Steven Garcia) Date: Tue, 15 May 2007 20:19:59 +0200 Subject: [Ferret-talk] AAF quirks in production mode In-Reply-To: <20070515180437.GC9379@cordoba.webit.de> References: <4555524C-91A6-4476-8AD9-121457BC571F@housecafemusic.com> <20070515170305.GA8905@cordoba.webit.de> <20070515173544.GA9379@cordoba.webit.de> <91B130A1-0DF7-4AA2-ABED-BDD2E694B87A@housecafemusic.com> <1086fb5f0705151049u1334004dmb3b9c35406724a7d@mail.gmail.com> <20070515180437.GC9379@cordoba.webit.de> Message-ID: <3103CDA5-6746-4176-A278-78CE9A3D8CFA@housecafemusic.com> Yikes, I totally didnt even consider DRB server..but i suppose its kind of a must for production huh? I really need to optimize my app... my VPS doesn't have any RAM left over in the meantime, I will be studying this more. any links to DRB integration would be welcome cheers SG From steven at housecafemusic.com Tue May 15 14:21:23 2007 From: steven at housecafemusic.com (Steven Garcia) Date: Tue, 15 May 2007 20:21:23 +0200 Subject: [Ferret-talk] AAF quirks in production mode In-Reply-To: <20070515180437.GC9379@cordoba.webit.de> References: <4555524C-91A6-4476-8AD9-121457BC571F@housecafemusic.com> <20070515170305.GA8905@cordoba.webit.de> <20070515173544.GA9379@cordoba.webit.de> <91B130A1-0DF7-4AA2-ABED-BDD2E694B87A@housecafemusic.com> <1086fb5f0705151049u1334004dmb3b9c35406724a7d@mail.gmail.com> <20070515180437.GC9379@cordoba.webit.de> Message-ID: <96DE74C3-73AB-464E-B5E2-B893A532870D@housecafemusic.com> Almost forgot to ask an obvious question.. how to i find out who the user is? I thought that chmod 775 would take care of this From dburkes at netable.com Tue May 15 14:41:33 2007 From: dburkes at netable.com (Danny Burkes) Date: Tue, 15 May 2007 20:41:33 +0200 Subject: [Ferret-talk] A Ferret/AAF success story In-Reply-To: <95a8e0ab09dd1f5f29b5b544cc40435c@ruby-forum.com> References: <4807400b8c3b7b00f439b892d5bc6450@ruby-forum.com> <20070514182632.GH27534@cordoba.webit.de> <95a8e0ab09dd1f5f29b5b544cc40435c@ruby-forum.com> Message-ID: <9d1f2bcff81abe32ba23339905b70e8b@ruby-forum.com> >> I think that multi-lingual tokenizer could be useful for other users of >> Ferret, too ;-) >> > > We'll definitely open-source that - just let me clean it up a bit, then > we'll post it at http://svn.lingr.com. I'll make an announcement here > when it's ready. > Voila- http://blog.lingr.com/2007/05/a_new_plugin.html - Danny -- Posted via http://www.ruby-forum.com/. From kyle at casttv.com Tue May 15 14:58:53 2007 From: kyle at casttv.com (Kyle Maxwell) Date: Tue, 15 May 2007 11:58:53 -0700 Subject: [Ferret-talk] Large File Support? (FAQs are broken) Message-ID: <47699a8d0705151158j732fd2c5w82b0cc657e029f86@mail.gmail.com> How does one compile Ferret with large file support? BTW, the FAQs section of the Ferret website is down, and I couldn't find any help via Google, or the mailing list archives. Thanks! -- Kyle Maxwell Software Engineer CastTV, Inc http://www.casttv.com From j_d_robbins at yahoo.com Tue May 15 15:08:06 2007 From: j_d_robbins at yahoo.com (Jacob Robbins) Date: Tue, 15 May 2007 21:08:06 +0200 Subject: [Ferret-talk] more_like_this In-Reply-To: <20070515175842.GB9379@cordoba.webit.de> References: <4e4ccf2fac9797b28a8f2bfb855089e7@ruby-forum.com> <8bf9a47da8d09f2a6ee7f65d59ac3ba6@ruby-forum.com> <20070515175842.GB9379@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > On Tue, May 15, 2007 at 05:37:19PM +0200, Jacob Robbins wrote: >> rails convention of creating a new active record object to store user >> query params. I'd like to make a regular rails form using a blank object >> and then call more_like_this on that object to do a search. > > This isn't supported by aaf but should be possible to do with a bit of > hacking :) > > It'll get a bit harder if you want to do this with the DRb server, since > then you'll have to transfer your unsaved record over to the server for > the more_like_this query to be built. Atm only id and class name > are transferred with method calls. > > Jens Thanks for checking into this Jens, i've done what i wanted by adding an instance method to aaf. In instance_methods.rb, right after the to_doc method, i added a to_ferret_query method. This avoids transfering the whole object when using the DRb server. Tell me what you think... >>>>>>>>>>>>>>>>>>> # Turn this instance into a ferret query derived from its field values. # Empty fields are ignored. Can be used on unsaved records. Typical use is to make # ferret query from a new object initialized from posted form values. # # Example: college.to_query(:fuzz => 0.6) # #=> "name:seattle~0.6 and name:university~0.6 and city:seattle~0.6" # # # === Options # # fuzz:: Default: nil. Float value for fuzziness to attach to search terms. # field_names:: Default: nil. (uses ferret indexed fields) Array of field names to use in query. # join_type:: Default: 'and'. String used to join query terms. # exclude:: Default: ['and', 'or']. Array of words to ignore in field values. def to_ferret_query(options = {}) options = { :field_names => self.class.aaf_configuration[:ferret_fields].keys, :join_type => 'and', :exclude => ['and','or'] }.update(options) terms = [] options[:field_names].each do |field| if val = self.send(field) val.to_s.split.each do |word| unless options[:exclude].include?(word.strip.downcase) terms << field.to_s + ':' + word + ( options[:fuzz] ? '~' + options[:fuzz].to_s : '' ) end end end end terms.join ' ' + options[:join_type] + ' ' end <<<<<<<<<<<<<<<<<<<<<<<<<< -- Posted via http://www.ruby-forum.com/. From newsletters_question at simpleweight.com Tue May 15 18:09:28 2007 From: newsletters_question at simpleweight.com (Scott) Date: Wed, 16 May 2007 00:09:28 +0200 Subject: [Ferret-talk] What is the day to day maintenance requirements of ferret? Message-ID: What should I be doing to maintain ferret and help prevent index corruption? Do I need to rebuild it manually every night/week/day? -- Posted via http://www.ruby-forum.com/. From tennisbum2002 at hotmail.com Tue May 15 20:17:44 2007 From: tennisbum2002 at hotmail.com (Aryk Grosz) Date: Wed, 16 May 2007 02:17:44 +0200 Subject: [Ferret-talk] return ONLY total_hits without querying from real database Message-ID: <23e01a45f233c1f2b9045db77ae9f747@ruby-forum.com> Hey guys, I know I can run search(q).total_hits, but if I try to put :limit=>0 it gives me an error. I don't want it actually query any of the results, I just want it to tell me how many total_hits I would have if I wanted to search it. How can I do this? -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed May 16 07:34:09 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 16 May 2007 13:34:09 +0200 Subject: [Ferret-talk] return ONLY total_hits without querying from real database In-Reply-To: <23e01a45f233c1f2b9045db77ae9f747@ruby-forum.com> References: <23e01a45f233c1f2b9045db77ae9f747@ruby-forum.com> Message-ID: <20070516113409.GD10018@cordoba.webit.de> On Wed, May 16, 2007 at 02:17:44AM +0200, Aryk Grosz wrote: > Hey guys, > > I know I can run search(q).total_hits, but if I try to put :limit=>0 it > gives me an error. I don't want it actually query any of the results, I > just want it to tell me how many total_hits I would have if I wanted to > search it. > > How can I do this? Model.total_hits(query) Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Wed May 16 07:38:43 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 16 May 2007 13:38:43 +0200 Subject: [Ferret-talk] What is the day to day maintenance requirements of ferret? In-Reply-To: References: Message-ID: <20070516113843.GE10018@cordoba.webit.de> On Wed, May 16, 2007 at 12:09:28AM +0200, Scott wrote: > What should I be doing to maintain ferret and help prevent index > corruption? > > Do I need to rebuild it manually every night/week/day? Usually you don't have to rebuild the index in regular intervals to prevent index corruption. To maintain high performance, regular optimizing makes sense. How often that is useful depends on your usage pattern (frequent updates -> frequent optimizations). That being said, if you have the CPU cycles and space, regular rebuilds won't hurt and will ensure your index always matches your database. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Wed May 16 07:55:57 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 16 May 2007 13:55:57 +0200 Subject: [Ferret-talk] more_like_this In-Reply-To: References: <4e4ccf2fac9797b28a8f2bfb855089e7@ruby-forum.com> <8bf9a47da8d09f2a6ee7f65d59ac3ba6@ruby-forum.com> <20070515175842.GB9379@cordoba.webit.de> Message-ID: <20070516115557.GF10018@cordoba.webit.de> On Tue, May 15, 2007 at 09:08:06PM +0200, Jacob Robbins wrote: > Jens Kraemer wrote: > > On Tue, May 15, 2007 at 05:37:19PM +0200, Jacob Robbins wrote: > >> rails convention of creating a new active record object to store user > >> query params. I'd like to make a regular rails form using a blank object > >> and then call more_like_this on that object to do a search. > > > > This isn't supported by aaf but should be possible to do with a bit of > > hacking :) > > > > It'll get a bit harder if you want to do this with the DRb server, since > > then you'll have to transfer your unsaved record over to the server for > > the more_like_this query to be built. Atm only id and class name > > are transferred with method calls. > > > > Jens > > Thanks for checking into this Jens, i've done what i wanted by adding an > instance method to aaf. In instance_methods.rb, right after the to_doc > method, i added a to_ferret_query method. This avoids transfering the > whole object when using the DRb server. Tell me what you think... Perfectly fine if it works for you. Aaf's more_like_this is more complicated, mainly because it tries to find out the 15 or so most relevant terms of your record's content to construct the query to support large documents (and it can even boost these single terms according to their relevance). I'll look into refactoring aaf a bit so that in future versions more_like_this can be used on unsaved records, too. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Wed May 16 08:09:49 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 16 May 2007 14:09:49 +0200 Subject: [Ferret-talk] AAF quirks in production mode In-Reply-To: <96DE74C3-73AB-464E-B5E2-B893A532870D@housecafemusic.com> References: <4555524C-91A6-4476-8AD9-121457BC571F@housecafemusic.com> <20070515170305.GA8905@cordoba.webit.de> <20070515173544.GA9379@cordoba.webit.de> <91B130A1-0DF7-4AA2-ABED-BDD2E694B87A@housecafemusic.com> <1086fb5f0705151049u1334004dmb3b9c35406724a7d@mail.gmail.com> <20070515180437.GC9379@cordoba.webit.de> <96DE74C3-73AB-464E-B5E2-B893A532870D@housecafemusic.com> Message-ID: <20070516120949.GG10018@cordoba.webit.de> On Tue, May 15, 2007 at 08:21:23PM +0200, Steven Garcia wrote: > Almost forgot to ask an obvious question.. how to i find out who the > user is? I thought that chmod 775 would take care of this chmod is used to set the permissions on a file (or directory or whatever :-). Basically each file has three sets of permissions associated with it: the first applies to the owning user of the file, the second to the group that owns the file, and the third to everybody else. chmod 755 means that the user owning the directory may read/write the directory (7), but everybody else may only read it (5 for group, 5 for others). To find out the owning user/group of a file use ls -l: drwxr-xr-x 4 jk users 4096 2007-01-10 11:09 index here the permissions are shown as 'rwx' (7) or 'r-x' (5), and that the index dir belongs to the user named 'jk' and the group 'users'. So to the user 'jk' the first set of permissions applies. To anybody else who is not 'jk', but a member of the 'users' group, the second set applies, and to everybody else the third one. To find out who's running your mongrel, use the ps command: $ ps aux |grep mongrel the u option tells ps to show the user running each process in the first column of the output. Grep is only used to filter the process list, might be you have to grep for something else (i.e. ruby) if you don't find your server by grepping for mongrel. Say the username your mongrel runs as is 'www', then change the ownership of your index directory and all containing files like that: chown -R www index/ Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Wed May 16 08:18:40 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 16 May 2007 14:18:40 +0200 Subject: [Ferret-talk] AAF quirks in production mode In-Reply-To: <3103CDA5-6746-4176-A278-78CE9A3D8CFA@housecafemusic.com> References: <4555524C-91A6-4476-8AD9-121457BC571F@housecafemusic.com> <20070515170305.GA8905@cordoba.webit.de> <20070515173544.GA9379@cordoba.webit.de> <91B130A1-0DF7-4AA2-ABED-BDD2E694B87A@housecafemusic.com> <1086fb5f0705151049u1334004dmb3b9c35406724a7d@mail.gmail.com> <20070515180437.GC9379@cordoba.webit.de> <3103CDA5-6746-4176-A278-78CE9A3D8CFA@housecafemusic.com> Message-ID: <20070516121840.GH10018@cordoba.webit.de> On Tue, May 15, 2007 at 08:19:59PM +0200, Steven Garcia wrote: > Yikes, I totally didnt even consider DRB server..but i suppose its > kind of a must for production huh? > > I really need to optimize my app... my VPS doesn't have any RAM left > over Dpeending on how large your index is and how many mongrel instances you're running, DRb might actually save you some RAM. > in the meantime, I will be studying this more. any links to DRB > integration would be welcome http://projects.jkraemer.net/acts_as_ferret/wiki/DrbServer http://www.jkraemer.net/2007/3/24/acts_as_ferret-0-4-0-rie Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From steven at housecafemusic.com Wed May 16 09:04:18 2007 From: steven at housecafemusic.com (Steven Garcia) Date: Wed, 16 May 2007 15:04:18 +0200 Subject: [Ferret-talk] AAF quirks in production mode In-Reply-To: <20070516121840.GH10018@cordoba.webit.de> References: <4555524C-91A6-4476-8AD9-121457BC571F@housecafemusic.com> <20070515170305.GA8905@cordoba.webit.de> <20070515173544.GA9379@cordoba.webit.de> <91B130A1-0DF7-4AA2-ABED-BDD2E694B87A@housecafemusic.com> <1086fb5f0705151049u1334004dmb3b9c35406724a7d@mail.gmail.com> <20070515180437.GC9379@cordoba.webit.de> <3103CDA5-6746-4176-A278-78CE9A3D8CFA@housecafemusic.com> <20070516121840.GH10018@cordoba.webit.de> Message-ID: Cheers Jens! I got everything working off DRB as advertised. :-) AAF is the bomb! From lukhnos at gmail.com Wed May 16 10:08:36 2007 From: lukhnos at gmail.com (Lukhnos d. Liu) Date: Wed, 16 May 2007 16:08:36 +0200 Subject: [Ferret-talk] How we got rid of a bus error when using acts_as_ferret Message-ID: Hi, We have just started using acts_as_ferret, and everything worked well until we started running into some bus error. The message in mongrel.log looked like: ..../active_support/core_ext/module/inclusion.rb:4: [BUG] Bus Error We were running on OS X. The same thing happened on Linux, only that the message was "segmentation fault." That was no good. After some searching on the web, we found some a few mailing list threads that described the same situation, but didn't come with any solution. Then we tried to zero in, first by removing the :analyzer part, and the bus error was gone. We started to suspect the analyzer that we were using: :analyzer => Ferret::Analysis::RegExpAnalyzer.new(FerretHelper::GENERIC_ANALYSIS_REGEX, true) Where GENERIC_ANALYSIS_REGEX is /([a-zA-Z]|[\xc0-\xdf][\x80-\xbf])+|[0-9]+|[\xe0-\xef][\x80-\xbf][\x80-\xbf]/ This is used, a la Jcode, to tokenize both European-language words, numbers, and CJV chars. Interestingly, we started to suspect if Mongrel's development mode is the culprit. But anyway, we just created the Analyzer beforehand, and put it somewhere (say in some lib/helpers/), and now we have something like: acts_as_ferret({:fields => [...] }, { :analyzer => FerretHelper::GENERIC_ANALYZER }) And no more bus error / segmentation fault. We didn't really understand why by sharing the analyzer things went better off. But I just hope by posting this episode it might help some people. Cheers, d. -- Posted via http://www.ruby-forum.com/. From alain.ravet+ferret at gmail.com Wed May 16 14:55:46 2007 From: alain.ravet+ferret at gmail.com (Alain Ravet) Date: Wed, 16 May 2007 20:55:46 +0200 Subject: [Ferret-talk] bilingual site: exclude fields set from query Message-ID: Hi all, Is there a way to have searches no use some indexed fields, when processing a query? context: I have a model Foo that holds some information in two languages : - text1_nl, text2_nl, text3_nl and - text1_en, text2_en, text3_en Some other fields are common to both languages and indexed as well - first_name, last_name Depending on the visitor language choice I need to exclude the first three, or last three fields when query processing. Is this doable relatively simply? I guess I could use two indexes, but I'd like to keep using acts_as_ferret if possible. TIA. Alain Ravet -------- http://blog.ravet.com From tennisbum2002 at hotmail.com Wed May 16 15:29:56 2007 From: tennisbum2002 at hotmail.com (Aryk Grosz) Date: Wed, 16 May 2007 21:29:56 +0200 Subject: [Ferret-talk] return ONLY total_hits without querying from real databa In-Reply-To: <20070516113409.GD10018@cordoba.webit.de> References: <23e01a45f233c1f2b9045db77ae9f747@ruby-forum.com> <20070516113409.GD10018@cordoba.webit.de> Message-ID: <58b3e3e69884804d862256880bb0eac3@ruby-forum.com> Hi Jens, I don't have that method, could it have something to do with the fact that Im using Ferret::MixinActs::ARFerret Do I need to upgrade or can I just add that function? Aryk Jens Kraemer wrote: > On Wed, May 16, 2007 at 02:17:44AM +0200, Aryk Grosz wrote: >> Hey guys, >> >> I know I can run search(q).total_hits, but if I try to put :limit=>0 it >> gives me an error. I don't want it actually query any of the results, I >> just want it to tell me how many total_hits I would have if I wanted to >> search it. >> >> How can I do this? > > Model.total_hits(query) > > Jens > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen Malessa -- Posted via http://www.ruby-forum.com/. From j_d_robbins at yahoo.com Wed May 16 16:08:28 2007 From: j_d_robbins at yahoo.com (Jacob Robbins) Date: Wed, 16 May 2007 22:08:28 +0200 Subject: [Ferret-talk] more_like_this In-Reply-To: <20070516115557.GF10018@cordoba.webit.de> References: <4e4ccf2fac9797b28a8f2bfb855089e7@ruby-forum.com> <8bf9a47da8d09f2a6ee7f65d59ac3ba6@ruby-forum.com> <20070515175842.GB9379@cordoba.webit.de> <20070516115557.GF10018@cordoba.webit.de> Message-ID: <54c55fbd0ce56c90148e80d96bc1dd0a@ruby-forum.com> > Aaf's more_like_this is more complicated, mainly because it tries to > find out the 15 or so most relevant terms of your record's content to > construct the query to support large documents (and it can even boost > these single terms according to their relevance). > Oh, now i get it. Yeah i run into this a lot with my deployment because we don't index big documents and most of ferret is geared for them. I use ferret to help users find bands, recordings and labels that are commonly mispelled. So for me... fuzzy searching: good, stopwords: bad. -- Posted via http://www.ruby-forum.com/. From doug.arogos at gmail.com Wed May 16 16:42:45 2007 From: doug.arogos at gmail.com (Doug Smith) Date: Wed, 16 May 2007 13:42:45 -0700 Subject: [Ferret-talk] bilingual site: exclude fields set from query In-Reply-To: References: Message-ID: <42d8808f0705161342i35730da3k449398a2dbf0fb52@mail.gmail.com> I would setup your models differently, using Single-Table Inheritance (STI). (You can get more info on it by Googling "rails single table inheritance".) For your example, you'd have a table with columns like: id, type, first_name, last_name, text1, text2, text3 Then, you'd have a model like: class Text << ActiveRecord::Base acts_as_ferret(... declare all fields here ) end You'd then subclass this with two other classes: class EnglishText << Text end class DutchText << Text end Then, to search one or the other, use: EnglishText.find_by_contents("english query") DutchText.find_by_contents("dutch query") That's a rough idea without exact code, but it should hopefully get you started. Thanks, Doug On 5/16/07, Alain Ravet wrote: > > Hi all, > > Is there a way to have searches no use some indexed fields, when > processing a query? > > context: > I have a model Foo that holds some information in two languages : > - text1_nl, text2_nl, text3_nl > and > - text1_en, text2_en, text3_en > Some other fields are common to both languages and indexed as well > - first_name, last_name > > Depending on the visitor language choice I need to exclude the first > three, or last three fields when query processing. Is this doable > relatively simply? > I guess I could use two indexes, but I'd like to keep using > acts_as_ferret if possible. > > > TIA. > > > Alain Ravet > -------- > http://blog.ravet.com > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070516/23c3c226/attachment-0001.html From alain.ravet+ferret at gmail.com Wed May 16 17:43:09 2007 From: alain.ravet+ferret at gmail.com (Alain Ravet) Date: Wed, 16 May 2007 23:43:09 +0200 Subject: [Ferret-talk] bilingual site: exclude fields set from query In-Reply-To: <42d8808f0705161342i35730da3k449398a2dbf0fb52@mail.gmail.com> References: <42d8808f0705161342i35730da3k449398a2dbf0fb52@mail.gmail.com> Message-ID: Doug > I would setup your models differently, using Single-Table Inheritance STI would be impractical (I don't want to add a useless "type" column), but plain inheritance sounds like a very good idea. > class Text << ActiveRecord::Base > acts_as_ferret(... declare all fields here ) > end> > You'd then subclass this with two other classes: > > class EnglishText << Text > end > class DutchText << Text > end I guess you meant : class Text << ActiveRecord::Base end class EnglishText << Text acts_as_ferret :fields => [name, text_en] end class DutchText << Text acts_as_ferret :fields => [name, text_nl] end That would indeed allow to call : > EnglishText.find_by_contents("english query") > DutchText.find_by_contents ("dutch query") Clean and simple. Thanks a lot. Alain Ravet ---- http://blog.ravet.com From hakita at gmail.com Thu May 17 12:02:55 2007 From: hakita at gmail.com (py landanger) Date: Thu, 17 May 2007 18:02:55 +0200 Subject: [Ferret-talk] How to make a Tag cloud with Ferret ? In-Reply-To: <1179156294.3867.21.camel@localhost.localdomain> References: <133fb1092bb336168950799ee4d71399@ruby-forum.com> <1179156294.3867.21.camel@localhost.localdomain> Message-ID: <4f657c52f1075924341aa6be0dd3a71d@ruby-forum.com> Thank you John ^^ -- Posted via http://www.ruby-forum.com/. From brodaigh at gmail.com Fri May 18 04:12:19 2007 From: brodaigh at gmail.com (Jos) Date: Fri, 18 May 2007 10:12:19 +0200 Subject: [Ferret-talk] select model & field for search Message-ID: <145a41703dab7fa95e9d0dbe0c260981@ruby-forum.com> hi, I was wondering if someone could help me? I am trying to create a select option for the ferret search in my app. I want users to be able to select from a range of fields and models to limit their query results. So far i have a select box; <%= select(:searchbox, :category, %w{everything tags name products })%> but i have no idea how to set this to limit the search within the categories. I know you can limit query by querying name:bob but this is too difficult for users. Is there a nice way to do code this? Thanx, Jos -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri May 18 13:50:41 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 18 May 2007 19:50:41 +0200 Subject: [Ferret-talk] return ONLY total_hits without querying from real databa In-Reply-To: <58b3e3e69884804d862256880bb0eac3@ruby-forum.com> References: <23e01a45f233c1f2b9045db77ae9f747@ruby-forum.com> <20070516113409.GD10018@cordoba.webit.de> <58b3e3e69884804d862256880bb0eac3@ruby-forum.com> Message-ID: <20070518175041.GA31786@cordoba.webit.de> On Wed, May 16, 2007 at 09:29:56PM +0200, Aryk Grosz wrote: > Hi Jens, > > I don't have that method, could it have something to do with the fact > that Im using > > Ferret::MixinActs::ARFerret > > Do I need to upgrade or can I just add that function? I'm not sure what you're using there - I was thinking it's acts_as_ferret because of the method name. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From druse.jon at gmail.com Fri May 18 13:53:39 2007 From: druse.jon at gmail.com (Jon Druse) Date: Fri, 18 May 2007 19:53:39 +0200 Subject: [Ferret-talk] Sorting by score In-Reply-To: References: <85bfb43ed70f65124de978016e63e3a8@ruby-forum.com> Message-ID: <799709933e69b5fa631c8cb42d0b941e@ruby-forum.com> > sort_fields << Ferret::Search::SortField::SCORE > sort = Ferret::Search::Sort.new(sort_fields) > > You can pass the array of SortFields as the :sort parameter or even a > sort string ("sponsored DESC, SCORE"). maybe i'm just a total noob, but where can i put this? here's my search mothod ... @results = Record.multi_search(params[:search_terms], [ Link, Post ], {:limit => :all}) hope that makes any sense :-) Jon -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri May 18 14:04:06 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 18 May 2007 20:04:06 +0200 Subject: [Ferret-talk] Sorting by score In-Reply-To: <799709933e69b5fa631c8cb42d0b941e@ruby-forum.com> References: <85bfb43ed70f65124de978016e63e3a8@ruby-forum.com> <799709933e69b5fa631c8cb42d0b941e@ruby-forum.com> Message-ID: <20070518180406.GC31786@cordoba.webit.de> On Fri, May 18, 2007 at 07:53:39PM +0200, Jon Druse wrote: > > > sort_fields << Ferret::Search::SortField::SCORE > > sort = Ferret::Search::Sort.new(sort_fields) > > > > You can pass the array of SortFields as the :sort parameter or even a > > sort string ("sponsored DESC, SCORE"). > > > maybe i'm just a total noob, but where can i put this? > > here's my search mothod ... > > > @results = Record.multi_search(params[:search_terms], [ Link, Post > ], {:limit => :all}) the :sort option belongs to the same hash as :limit. here's a sample from the aaf unit tests: sorting = [ Ferret::Search::SortField.new(:id) ] result = Content.multi_search('*:title OR *:comment', [Comment], :sort => sorting) note that I don't use the Sort class at all, an Array of SortFields is ok. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From tennisbum2002 at hotmail.com Fri May 18 14:33:22 2007 From: tennisbum2002 at hotmail.com (Aryk Grosz) Date: Fri, 18 May 2007 20:33:22 +0200 Subject: [Ferret-talk] return ONLY total_hits without querying from real databa In-Reply-To: <20070518175041.GA31786@cordoba.webit.de> References: <23e01a45f233c1f2b9045db77ae9f747@ruby-forum.com> <20070516113409.GD10018@cordoba.webit.de> <58b3e3e69884804d862256880bb0eac3@ruby-forum.com> <20070518175041.GA31786@cordoba.webit.de> Message-ID: Here is what acts_as_ferret.rb says, maybe this is a really old version or something. # The Rails ActiveRecord Ferret Mixin. # # This mixin adds full text search capabilities to any Rails model. # # The current version emerged from on the original acts_as_ferret plugin done by # Kasper Weibel and a modified version done by Thomas Lockney, which both can be # found on the Ferret Wiki: http://ferret.davebalmain.com/trac/wiki/FerretOnRails. # # basic usage: # include the following in your model class (specifiying the fields you want to get indexed): # acts_as_ferret :fields => [ 'title', 'description' ] # # now you can use ModelClass.find_by_contents(query) to find instances of your model # whose indexed fields match a given query. All query terms are required by default, but # explicit OR queries are possible. This differs from the ferret default, but imho is the more # often needed/expected behaviour (more query terms result in less results). # # Released under the MIT license. # # Authors: # Kasper Weibel Nielsen-Refs (original author) # Jens Kraemer (active maintainer) -- Posted via http://www.ruby-forum.com/. From msdaly at gmail.com Fri May 18 14:33:24 2007 From: msdaly at gmail.com (Mike Daly) Date: Fri, 18 May 2007 20:33:24 +0200 Subject: [Ferret-talk] how to compile with large file support? Message-ID: Hi, I'm trying to figure out how to compile ferret with large file support, but none of the topics that discuss this actually say How this is done. Can someone please provide the info? thanks. -m my exact problem: http://www.ruby-forum.com/topic/94143#191630 this topic also discusses the issue: http://www.ruby-forum.com/topic/84237#151791 this topic says that the FAQ should have the answer, which it doesn't: http://www.ruby-forum.com/topic/84205#151312 -- Posted via http://www.ruby-forum.com/. From druse.jon at gmail.com Fri May 18 14:56:07 2007 From: druse.jon at gmail.com (Jon Druse) Date: Fri, 18 May 2007 20:56:07 +0200 Subject: [Ferret-talk] Sorting by score In-Reply-To: <20070518180406.GC31786@cordoba.webit.de> References: <85bfb43ed70f65124de978016e63e3a8@ruby-forum.com> <799709933e69b5fa631c8cb42d0b941e@ruby-forum.com> <20070518180406.GC31786@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > On Fri, May 18, 2007 at 07:53:39PM +0200, Jon Druse wrote: >> here's my search mothod ... >> >> >> @results = Record.multi_search(params[:search_terms], [ Link, Post >> ], {:limit => :all}) > > the :sort option belongs to the same hash as :limit. > > here's a sample from the aaf unit tests: > > sorting = [ Ferret::Search::SortField.new(:id) ] > result = Content.multi_search('*:title OR *:comment', > [Comment], > :sort => sorting) > > note that I don't use the Sort class at all, an Array of SortFields is > ok. sorry i'm so lame... i though we were sorting by score? my results don't seem to change when i put that in there. :-) Jon -- Posted via http://www.ruby-forum.com/. From marc at since1968.com Fri May 18 15:45:54 2007 From: marc at since1968.com (Marc Garrett) Date: Fri, 18 May 2007 21:45:54 +0200 Subject: [Ferret-talk] find_by_contents + 'conditions' returning incorrect results Message-ID: I've read the other threads like this one (http://www.ruby-forum.com/topic/78841) but I'm not sure what I'm doing wrong. Scenario: I need a full text search on financial institutions and to constrain that list by state (=) or states (IN). My lenders model includes a name and a state, among other fields. In my lenders model: acts_as_ferret :fields => [:name] Running this in script/console, I get a correct result (about 2100 hits): Lender.find_by_contents("bank") But when I start constraining with active record I get weird results. This turns up one hit, which is correct: Lender.find_by_contents("suntrust", {}, {:conditions=>["sta1 = 'DC'"]}) But this turns up zero hits, when it should be 11 hits according to my SQL: Lender.find_by_contents("bank", {}, {:conditions=>["sta1 = 'DC'"]}) Also, an IN search doesn't appear to constrain results at all. This should turn up 16 rows but it turns up 31 (all the suntrusts, not constrained by state): Lender.find_by_contents("suntrust", {:conditions=>["sta1 IN ?", "('AL','DC', 'FL')"]}) Any ideas? -- Posted via http://www.ruby-forum.com/. From dickjr at gmail.com Fri May 18 17:13:00 2007 From: dickjr at gmail.com (Richard Jones) Date: Fri, 18 May 2007 17:13:00 -0400 Subject: [Ferret-talk] roll my own TokenFilter subclass Message-ID: <381eb1660705181413h420f731av807c7dd1a1109b24@mail.gmail.com> Hi all, I'd like to write my own TokenStream Filter (in lucene this would be a subclass of a TokenFilter, which ferret seems to lack) but I'm not sure how to go about it. Specifically, it's not clear how I'd create a non-trivial TokenStream to pass out to any filters that wrapped mine. Can anyone point me towards a code example? Thanks. -- Richard Jones dickjr at gmail.com From marc at since1968.com Fri May 18 18:35:45 2007 From: marc at since1968.com (Marc Garrett) Date: Sat, 19 May 2007 00:35:45 +0200 Subject: [Ferret-talk] find_by_contents + 'conditions' returning incorrect resu In-Reply-To: References: Message-ID: OK, solved my own problem--the answer was implicit in a lot of Jens' responses here but I didn't grok it. Here's the correct code: Lender.find_by_contents(params[:name], {:limit=>:all}, {:conditions=>["sta1 = 'DC'"]}) Reason: If you don't use the :limit option, find_by_contents limits the number of results automatically, and does so BEFORE ActiveRecord applies the conditions. Hope this helps someone. -- Posted via http://www.ruby-forum.com/. From dburkes at netable.com Fri May 18 18:41:53 2007 From: dburkes at netable.com (Danny Burkes) Date: Sat, 19 May 2007 00:41:53 +0200 Subject: [Ferret-talk] roll my own TokenFilter subclass In-Reply-To: <381eb1660705181413h420f731av807c7dd1a1109b24@mail.gmail.com> References: <381eb1660705181413h420f731av807c7dd1a1109b24@mail.gmail.com> Message-ID: > Can anyone point me towards a code example? Thanks. It's not exactly what you want, but you might glean some help from our Multilingual Ferret Tools at http://svn.lingr.com/plugins/multilingual_ferret_tools Best Regards, Danny -- Posted via http://www.ruby-forum.com/. From syrius.ml at no-log.org Fri May 18 19:33:59 2007 From: syrius.ml at no-log.org (syrius.ml at no-log.org) Date: Sat, 19 May 2007 01:33:59 +0200 Subject: [Ferret-talk] issues with : in the content Message-ID: <87ps4xafoe.87odkhafoe@87mz01afoe.message.id> Hi, I've discovered ferret and aaf this evening, I've just done some tests and it seems perfect for my needs. I'm indexing text data (title, description, etc) and also ethernet hardware addresses (MAC). Sorry if that sounds trivial but I can't find the way to correctly index and achieve correct searches on MAC addresses. If I do something like this: index = Index::Index.new() index << {:hwaddr => '00:11:22:33:44:55'} index.search_each('"11:11"') do |id, score| puts "Document #{id} found with a score of #{score}" end it matches. if i search '11\:11' it also matches. if the search is '00*11*' or '*11*22*' it does not matches if hwaddr = '00z11z22z33z44z55' it works as expected. If tried with untokenized index but that didn't help. Should I escape : before indexing ? (that's not convenient) Should I use another Analyzer ? Any help would be appreciated. Thanks in advance. -- From bk at benjaminkrause.com Sat May 19 04:16:19 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Sat, 19 May 2007 10:16:19 +0200 Subject: [Ferret-talk] issues with : in the content In-Reply-To: <87ps4xafoe.87odkhafoe@87mz01afoe.message.id> References: <87ps4xafoe.87odkhafoe@87mz01afoe.message.id> Message-ID: <6160B1C5-A025-4E9E-95F3-3D9C657A7F77@benjaminkrause.com> Hey .. what you should do is to write your own analyzer.. that splits the HWAddress at the : and therefore stores each part of the MAC address as a separate token.. this can be done using the RegExpAnalyzer .. maybe like that: RegExpAnalyzer.new(/[^:]+/, true) [1] I would then search via SpanNearQueries [2] to search for certain MAC parts in a specific order.. like that query = SpanNearQuery.new(:slop => 5, :in_order => true) query << SpanTermQuery.new(:hwaddr, "11") query << SpanTermQuery.new(:fhwaddr, "22") this should find all items with 1122 Hope that helps .. Ben [1] http://ferret.davebalmain.com/api/classes/Ferret/Analysis/ RegExpAnalyzer.html [2] http://ferret.davebalmain.com/api/classes/Ferret/Search/Spans/ SpanNearQuery.html From bk at benjaminkrause.com Sat May 19 04:59:04 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Sat, 19 May 2007 10:59:04 +0200 Subject: [Ferret-talk] How to make a Tag cloud with Ferret ? In-Reply-To: <4f657c52f1075924341aa6be0dd3a71d@ruby-forum.com> References: <133fb1092bb336168950799ee4d71399@ruby-forum.com> <1179156294.3867.21.camel@localhost.localdomain> <4f657c52f1075924341aa6be0dd3a71d@ruby-forum.com> Message-ID: <60B266AD-5964-465F-AE9A-AE75F4527045@benjaminkrause.com> Hey John.. well, thats a tough question, esp. as there is no 'general rule' on how a tag could should look like and which criteria should be taken into account.. http://www.omdb.org/encyclopedia is based on ferret.. but we're storing a :popularity for each tag.. and ferret is simply sorting its index using the popularity .. here's what we do [1]: def popular_categories_by_type( root_id ) query = BooleanQuery.new query.add_query( TermQuery.new( :type, Category.to_s.downcase ), :must ) query.add_query( TermQuery.new( :root_id, root_id.to_s ), :must ) query.add_query( TermQuery.new( :is_assignable, '1' ), :must ) order = orderfield( [ "popularity".to_sym ], :type => :integer, :reverse => true ) objects = self.real_search( query, :limit => 30, :order => order ) end hope that helps .. Ben [1] http://bugs.omdb.org/browser/trunk/lib/omdb/ferret/local_search.rb From syrius.ml at no-log.org Sat May 19 08:35:20 2007 From: syrius.ml at no-log.org (syrius.ml at no-log.org) Date: Sat, 19 May 2007 14:35:20 +0200 Subject: [Ferret-talk] issues with : in the content In-Reply-To: <6160B1C5-A025-4E9E-95F3-3D9C657A7F77@benjaminkrause.com> (Benjamin Krause's message of "Sat, 19 May 2007 10:16:19 +0200") References: <87ps4xafoe.87odkhafoe@87mz01afoe.message.id> <6160B1C5-A025-4E9E-95F3-3D9C657A7F77@benjaminkrause.com> Message-ID: <871whdjb42.87zm41hwjm@87y7jlhwjm.message.id> Benjamin Krause writes: > Hey .. > > what you should do is to write your own analyzer.. that splits > the HWAddress at the : and therefore stores each part of > the MAC address as a separate token.. this can be done using > the RegExpAnalyzer .. maybe like that: > > RegExpAnalyzer.new(/[^:]+/, true) [1] > > I would then search via SpanNearQueries [2] to search for certain > MAC parts in a specific order.. like that > > query = SpanNearQuery.new(:slop => 5, :in_order => true) > query << SpanTermQuery.new(:hwaddr, "11") > query << SpanTermQuery.new(:fhwaddr, "22") > > this should find all items with 1122 > > Hope that helps .. Hey it does. Thanks. I first thought it was a bug and I would have liked an easier solution. (for ex: stop the Analyzer to condiser ':' as a stop word ) I don't think I need to use the RegExpAnalyzer for hwaddr since the Standard one also cuts on ':'. I'm going to use :slop=>1, :in_order => true And I'll try to detect hwaddr search queries to feed SpanNearQuery accordingly by looking for ':' in the query and see if the word before ':' matches a fieldname. (if it doesn't and looks like a hwaddr I'll feed SpanNearQuery) Pretty sure that could be done in a nicer way. (don't hesitate to make suggestions :)) Also if there's other ways to index mac addresses without splitting on : I would be interested to read about them. (especially if I can use the query without too much processing) Anyway, Thanks again for the quick answer. -- From eimorton at gmail.com Sat May 19 15:12:50 2007 From: eimorton at gmail.com (Erik Morton) Date: Sat, 19 May 2007 15:12:50 -0400 Subject: [Ferret-talk] how to compile with large file support? In-Reply-To: References: Message-ID: <5C151711-35BE-405A-A90E-C7D750CCF594@gmail.com> Mike, How are you coming on this? I just built an index that tops out at just above 2GBs and I installed Ferret with the standard gem install ferret routine. du -h indexes/final/ 2.0G indexes/final/ I'm curious why I didn't encounter the same issue you did. I just combined a 1.7GB index with four indexes of approximately 100MB each. Erik On May 18, 2007, at 2:33 PM, Mike Daly wrote: > Hi, > > I'm trying to figure out how to compile ferret with large file > support, > but none of the topics that discuss this actually say How this is > done. > Can someone please provide the info? > > thanks. > -m > > > my exact problem: > http://www.ruby-forum.com/topic/94143#191630 > > this topic also discusses the issue: > http://www.ruby-forum.com/topic/84237#151791 > > this topic says that the FAQ should have the answer, which it doesn't: > http://www.ruby-forum.com/topic/84205#151312 > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From kyle at casttv.com Sat May 19 15:31:49 2007 From: kyle at casttv.com (Kyle Maxwell) Date: Sat, 19 May 2007 12:31:49 -0700 Subject: [Ferret-talk] how to compile with large file support? In-Reply-To: <5C151711-35BE-405A-A90E-C7D750CCF594@gmail.com> References: <5C151711-35BE-405A-A90E-C7D750CCF594@gmail.com> Message-ID: <47699a8d0705191231v247fea45q2219f5e52f29188f@mail.gmail.com> > How are you coming on this? I just built an index that tops out at > just above 2GBs and I installed Ferret with the standard gem install > ferret routine. I'm attaching a patch against ferret trunk rev 770. It's got a little cruft, but it fixes large fiel support in Ferret. -- Kyle Maxwell Software Engineer CastTV, Inc http://www.casttv.com -------------- next part -------------- A non-text attachment was scrubbed... Name: ferret64.patch Type: application/octet-stream Size: 13244 bytes Desc: not available Url : http://rubyforge.org/pipermail/ferret-talk/attachments/20070519/d7ecd096/attachment-0001.obj From kyle at casttv.com Sat May 19 15:26:10 2007 From: kyle at casttv.com (Kyle Maxwell) Date: Sat, 19 May 2007 12:26:10 -0700 Subject: [Ferret-talk] how to compile with large file support? In-Reply-To: <5C151711-35BE-405A-A90E-C7D750CCF594@gmail.com> References: <5C151711-35BE-405A-A90E-C7D750CCF594@gmail.com> Message-ID: <47699a8d0705191226o16ffe315m1c240b553af03dbb@mail.gmail.com> First, the limit is 2**31, so it's a little bit more than 2.14e9 bytes. Second, large file support is compiled in by default. There's just some stray ints that should be off_t, particularly in the storage code. I'm going to submit the patch this weekend, after I clean out some extra debug code.. If you don't store fields, you'll prolly be fine. -- Kyle Maxwell Software Engineer CastTV, Inc http://www.casttv.com From syrius.ml at no-log.org Sat May 19 16:35:14 2007 From: syrius.ml at no-log.org (syrius.ml at no-log.org) Date: Sat, 19 May 2007 22:35:14 +0200 Subject: [Ferret-talk] issues with : in the content In-Reply-To: <871whdjb42.87zm41hwjm@87y7jlhwjm.message.id> (syrius ml's message of "Sat, 19 May 2007 14:35:20 +0200") References: <87ps4xafoe.87odkhafoe@87mz01afoe.message.id> <6160B1C5-A025-4E9E-95F3-3D9C657A7F77@benjaminkrause.com> <871whdjb42.87zm41hwjm@87y7jlhwjm.message.id> Message-ID: <87fy5siml5.87ejlciml5@87d50wiml5.message.id> syrius.ml at no-log.org writes: > Also if there's other ways to index mac addresses without splitting on > : I would be interested to read about them. (especially if I can use > the query without too much processing) Oh in fact what i want to use is the WhiteSpaceAnalyzer for the field 'hwaddr' ... (i seems i missed this one before) :) -- From msdaly at gmail.com Sat May 19 22:44:28 2007 From: msdaly at gmail.com (Mike Daly) Date: Sun, 20 May 2007 04:44:28 +0200 Subject: [Ferret-talk] how to compile with large file support? In-Reply-To: <5C151711-35BE-405A-A90E-C7D750CCF594@gmail.com> References: <5C151711-35BE-405A-A90E-C7D750CCF594@gmail.com> Message-ID: > How are you coming on this? I just built an index that tops out at > just above 2GBs and I installed Ferret with the standard gem install > ferret routine. perhaps your index is just a few bytes under the max... my usage is at 3.5G. i haven't done anything special, just using ferret and AAF gems: --- MODEL CODE class MyModel < ActiveRecord::Base # think of body/title in terms of an average blog acts_as_ferret :fields => { 'body' => {}, 'title' => { :boost => 2 } } end --- INDEX CODE # new index from scratch index = Ferret::Index::Index.new(MyModel.aaf_configuration[:ferret].dup.update(:auto_flush => false, :field_infos => MyModel.aaf_index.field_infos, :create => true)) n = 0 BATCH_SIZE = 1000 while true records = MyModel.find(:all, :limit => BATCH_SIZE, :offset => n, :select => "id,#{MyModel.aaf_configuration[:ferret_fields].keys.join(',')}") break if (!records || records.length == 0) records.each do |record| index << record.to_doc # aaf method end n += BATCH_SIZE end index.flush index.optimize # 30+ minutes =( index.close --- CONFIG > gem list | grep ferret acts_as_ferret (0.4.0) ferret (0.11.4) > uname -a Linux gentoo 2.6.20-hardened #3 SMP Fri Mar 30 19:27:10 UTC 2007 x86_64 Intel(R) Pentium(R) D CPU 3.00GHz GenuineIntel GNU/Linux -- Posted via http://www.ruby-forum.com/. From msdaly at gmail.com Sat May 19 22:47:48 2007 From: msdaly at gmail.com (Mike Daly) Date: Sun, 20 May 2007 04:47:48 +0200 Subject: [Ferret-talk] how to compile with large file support? In-Reply-To: <47699a8d0705191226o16ffe315m1c240b553af03dbb@mail.gmail.com> References: <5C151711-35BE-405A-A90E-C7D750CCF594@gmail.com> <47699a8d0705191226o16ffe315m1c240b553af03dbb@mail.gmail.com> Message-ID: > large file support is compiled in by default. There's > just some stray ints that should be off_t, particularly in the storage > code. I'm going to submit the patch this weekend, after I clean out > some extra debug code.. If you don't store fields, you'll prolly be > fine. cool, i hope this fixes the problem. i'll wait for the next gem version and see what happens. =) thanks -m -- Posted via http://www.ruby-forum.com/. From newsletters_question at simpleweight.com Sun May 20 00:10:49 2007 From: newsletters_question at simpleweight.com (Scott) Date: Sun, 20 May 2007 06:10:49 +0200 Subject: [Ferret-talk] What is the day to day maintenance requirements of ferre In-Reply-To: <20070516113843.GE10018@cordoba.webit.de> References: <20070516113843.GE10018@cordoba.webit.de> Message-ID: <830659ada8083ac748386ad66b70ec3a@ruby-forum.com> I guess I am coming from a Database thoughts, and with indexes, you need to maintain them over time. The other reason I ask, our search using ferret and aaf seems to lock up about once a week for us. Rebuilding the index fixes the problem. If I start and stop rails and mongrel, it does not fix it the only thing that causes the lock up is a search. Thanks for you reply Jens Kraemer wrote: > On Wed, May 16, 2007 at 12:09:28AM +0200, Scott wrote: >> What should I be doing to maintain ferret and help prevent index >> corruption? >> >> Do I need to rebuild it manually every night/week/day? > > Usually you don't have to rebuild the index in regular intervals to > prevent index corruption. > > To maintain high performance, regular optimizing makes sense. How often > that is useful depends on your usage pattern (frequent updates -> > frequent optimizations). > > That being said, if you have the CPU cycles and space, regular rebuilds > won't hurt and will ensure your index always matches your database. > > > Jens > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen Malessa -- Posted via http://www.ruby-forum.com/. From art.spasky at gmail.com Mon May 21 02:38:36 2007 From: art.spasky at gmail.com (=?KOI8-R?B?4dLUxc0g89DB08vJyg==?=) Date: Mon, 21 May 2007 09:38:36 +0300 Subject: [Ferret-talk] acts_as_ferret and russian language. Message-ID: <99cbded90705202338x7a94f76boaf5f60876d5700f3@mail.gmail.com> Hi. Does acts_as_ferret support russian language? My ruby hangs when I try to search a russian phrase. Thank you in advance. From bk at benjaminkrause.com Mon May 21 03:51:23 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Mon, 21 May 2007 09:51:23 +0200 Subject: [Ferret-talk] acts_as_ferret and russian language. In-Reply-To: <99cbded90705202338x7a94f76boaf5f60876d5700f3@mail.gmail.com> References: <99cbded90705202338x7a94f76boaf5f60876d5700f3@mail.gmail.com> Message-ID: <15EDFE3F-32BB-403A-A2AE-1FED244BDF38@benjaminkrause.com> On May 21, 2007, at 08:38, ????? ??????? wrote: > Does acts_as_ferret support russian language? > > My ruby hangs when I try to search a russian phrase yes, it does .. just make sure you are using utf8 .. we've been using ferret/AAF with chinese, japanese, hebrew and latin characters for quite some time.. Ben From kraemer at webit.de Mon May 21 05:00:37 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 21 May 2007 11:00:37 +0200 Subject: [Ferret-talk] Sorting by score In-Reply-To: References: <85bfb43ed70f65124de978016e63e3a8@ruby-forum.com> <799709933e69b5fa631c8cb42d0b941e@ruby-forum.com> <20070518180406.GC31786@cordoba.webit.de> Message-ID: <20070521090037.GA31602@cordoba.webit.de> On Fri, May 18, 2007 at 08:56:07PM +0200, Jon Druse wrote: > Jens Kraemer wrote: > > On Fri, May 18, 2007 at 07:53:39PM +0200, Jon Druse wrote: > >> here's my search mothod ... > >> > >> > >> @results = Record.multi_search(params[:search_terms], [ Link, Post > >> ], {:limit => :all}) > > > > the :sort option belongs to the same hash as :limit. > > > > here's a sample from the aaf unit tests: > > > > sorting = [ Ferret::Search::SortField.new(:id) ] > > result = Content.multi_search('*:title OR *:comment', > > [Comment], > > :sort => sorting) > > > > note that I don't use the Sort class at all, an Array of SortFields is > > ok. > > > sorry i'm so lame... i though we were sorting by score? my results > don't seem to change when i put that in there. :-) my fault, I just cut'n'pasted the code from the test. replace the sorting = ... line with: sorting = [ Ferret::Search::SortField::SCORE ] Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Mon May 21 05:52:54 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 21 May 2007 11:52:54 +0200 Subject: [Ferret-talk] return ONLY total_hits without querying from real databa In-Reply-To: References: <23e01a45f233c1f2b9045db77ae9f747@ruby-forum.com> <20070516113409.GD10018@cordoba.webit.de> <58b3e3e69884804d862256880bb0eac3@ruby-forum.com> <20070518175041.GA31786@cordoba.webit.de> Message-ID: <20070521095254.GB31602@cordoba.webit.de> On Fri, May 18, 2007 at 08:33:22PM +0200, Aryk Grosz wrote: > Here is what acts_as_ferret.rb says, maybe this is a really old version > or something. yes, seems so. You should upgrade to the current stable version. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Mon May 21 07:33:21 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 21 May 2007 13:33:21 +0200 Subject: [Ferret-talk] bilingual site: exclude fields set from query In-Reply-To: References: Message-ID: <20070521113321.GC31602@cordoba.webit.de> On Wed, May 16, 2007 at 08:55:46PM +0200, Alain Ravet wrote: > Hi all, > > Is there a way to have searches no use some indexed fields, when > processing a query? > > context: > I have a model Foo that holds some information in two languages : > - text1_nl, text2_nl, text3_nl > and > - text1_en, text2_en, text3_en > Some other fields are common to both languages and indexed as well > - first_name, last_name > > Depending on the visitor language choice I need to exclude the first > three, or last three fields when query processing. Is this doable > relatively simply? > I guess I could use two indexes, but I'd like to keep using > acts_as_ferret if possible. A query string like text1_nl|text2_nl|text3_nl|first_name|last_name:query will only search for query in the named fields. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From fake1 at rh-productions.ch Mon May 21 10:21:28 2007 From: fake1 at rh-productions.ch (masone) Date: Mon, 21 May 2007 16:21:28 +0200 Subject: [Ferret-talk] Wrong total_hits when using conditions in find_by_conten In-Reply-To: <20070502123133.GD4687@cordoba.webit.de> References: <79ede73a047bb56bad3e5d0f05763f0f@ruby-forum.com> <20070502123133.GD4687@cordoba.webit.de> Message-ID: I just ran into a similar problem using find_by_contents through has_many relationships. Jens' class_methods.rb Rev 187 solved my issues. Thanks! masone Jens Kraemer wrote: > Hi! > > That's a nice bug you've found there - getting the real result count is > a bit tricky when the result set is limited by both ferret's :limit > option and active record conditions. It's one of those things I always > wanted to fix but finally forgot about ;-) > > I committed a possible fix right now that comes at the cost of an > additional ferret query and a "select count(*) where ...". Would be nice > if you could try out the current trunk of aaf to see if this fixes the > problem. > > If the additional queries and counting are too slow for you (but first, > give it a try ;-), you could eliminate the need for active record > conditions by indexing the forum_id and creationDate columns as > untokenized values and let ferret handle them. > > Jens > > > > On Wed, May 02, 2007 at 12:12:08PM +0200, Chengcai He wrote: >> options = default_options.merge options >> >> end >> http://www.ruby-forum.com/topic/93822 >> >> thanks! >> >> -- >> Posted via http://www.ruby-forum.com/. >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk >> > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen Malessa -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon May 21 10:46:09 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 21 May 2007 16:46:09 +0200 Subject: [Ferret-talk] select model & field for search In-Reply-To: <145a41703dab7fa95e9d0dbe0c260981@ruby-forum.com> References: <145a41703dab7fa95e9d0dbe0c260981@ruby-forum.com> Message-ID: <20070521144609.GD31602@cordoba.webit.de> On Fri, May 18, 2007 at 10:12:19AM +0200, Jos wrote: > hi, I was wondering if someone could help me? I am trying to create a > select option for the ferret search in my app. I want users to be able > to select from a range of fields and models to limit their query > results. > > So far i have a select box; > > > <%= select(:searchbox, :category, %w{everything tags name products > })%> > > but i have no idea how to set this to limit the search within the > categories. > > I know you can limit query by querying name:bob but this is too > difficult for users. Is there a nice way to do code this? you could build the query string programmatically, or build query objects directly from your user's input (the Ferret API docs have more info about the various kinds of query types available) Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From alain.ravet+ferret at gmail.com Mon May 21 12:03:39 2007 From: alain.ravet+ferret at gmail.com (Alain Ravet) Date: Mon, 21 May 2007 18:03:39 +0200 Subject: [Ferret-talk] bilingual site: exclude fields set from query In-Reply-To: <20070521113321.GC31602@cordoba.webit.de> References: <20070521113321.GC31602@cordoba.webit.de> Message-ID: Jens, > A query string like > text1_nl|text2_nl|text3_nl|first_name|last_name:query > will only search for query in the named fields. Thanks. Is this the proper syntax for multi-words queries: first_name|last_name:(jo* OR va*) ? Alain From bk at benjaminkrause.com Mon May 21 12:49:49 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Mon, 21 May 2007 18:49:49 +0200 Subject: [Ferret-talk] bilingual site: exclude fields set from query In-Reply-To: References: <20070521113321.GC31602@cordoba.webit.de> Message-ID: <50C01D36-39FD-47B8-A425-D9D7BFD5736E@benjaminkrause.com> Hey Alain, finally jumped on board? :-) hope to see you in berlin in a few month .. > Is this the proper syntax for multi-words queries: > > first_name|last_name:(jo* OR va*) i would suggest to use the OO-Syntax for more complex queries .. like that: query = BooleanQuery.new [ :first_name, :last_name ].each do |key| query.add_query( PrefixQuery.new( key, 'jo' ), :should ) end for a people-livesearch, i do something like [1]: def build_livesearch_query( fields, q ) q = filter_special_characters( q ) # custom method query = BooleanQuery.new fields.uniq.each do |field| sq = SpanNearQuery.new(:in_order => true, :slop => 1) terms = analyze_query(q, field) # custom method terms.each do |term| sq << SpanPrefixQuery.new( field, term ) end query << sq end query end i call that like build_livesearch_query( [:first_name, :last_name, :email, q ) # q is an array of search words Ben [1] all of the code at http://bugs.omdb.org/browser/trunk/lib/omdb/ ferret/local_search.rb From alain.ravet+ferret at gmail.com Mon May 21 13:10:25 2007 From: alain.ravet+ferret at gmail.com (Alain Ravet) Date: Mon, 21 May 2007 19:10:25 +0200 Subject: [Ferret-talk] bilingual site: exclude fields set from query In-Reply-To: <50C01D36-39FD-47B8-A425-D9D7BFD5736E@benjaminkrause.com> References: <20070521113321.GC31602@cordoba.webit.de> <50C01D36-39FD-47B8-A425-D9D7BFD5736E@benjaminkrause.com> Message-ID: Hi Ben, > Hey Alain, > finally jumped on board? :-) hope to see you in berlin in a few months Can't wait to register. My credit card is on the starting-blocks. > > Is this the proper syntax for multi-words queries: > > first_name|last_name:(jo* OR va*) > > i would suggest to use the OO-Syntax for more complex queries .. like > query = BooleanQuery.new I'm refactoring/upgrading an "old" (1-year) app, where I used to do all the ferret index stuff myself (callbacks, queries, ..), to the mucho simpler interface offered by acts_as_ferret. AFAIK, aaf only works with string queries, so if there is a string-based solution that works, I'd rather use this one. If there is none, I'll bite the bullet and follow your advice. "A man's got to do, what a man's got to do" Alain -- Alain Ravet -------- http://blog.ravet.com From druse.jon at gmail.com Mon May 21 15:02:05 2007 From: druse.jon at gmail.com (Jon Druse) Date: Mon, 21 May 2007 21:02:05 +0200 Subject: [Ferret-talk] Sorting by score In-Reply-To: <20070521090037.GA31602@cordoba.webit.de> References: <85bfb43ed70f65124de978016e63e3a8@ruby-forum.com> <799709933e69b5fa631c8cb42d0b941e@ruby-forum.com> <20070518180406.GC31786@cordoba.webit.de> <20070521090037.GA31602@cordoba.webit.de> Message-ID: <095365fa28dd5beedf100b3433b2938b@ruby-forum.com> Jens Kraemer wrote: > > my fault, I just cut'n'pasted the code from the test. > replace the > sorting = ... > line with: > sorting = [ Ferret::Search::SortField::SCORE ] so i must be giving you a headache. :( when i put this in sort = [ Ferret::Search::SortField::SCORE ] it works just fine, but if i put this in sort = [ Ferret::Search::SortField::SCORE(:reverse => true) ] i get undefined method `SCORE' for Ferret::Search::SortField:Class there's probably something really really simple that i missed. here's the whole function for reference. sort = [ Ferret::Search::SortField::SCORE(:reverse => true) ] @results = Site.multi_search(params[:search_terms], [ Link, Post ], {:limit => :all, :sort => sort }) -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Mon May 21 15:02:49 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Mon, 21 May 2007 21:02:49 +0200 Subject: [Ferret-talk] bilingual site: exclude fields set from query In-Reply-To: References: <20070521113321.GC31602@cordoba.webit.de> <50C01D36-39FD-47B8-A425-D9D7BFD5736E@benjaminkrause.com> Message-ID: <39E31DD1-76AB-4BBA-AE80-EFF2A588C260@benjaminkrause.com> > > I'm refactoring/upgrading an "old" (1-year) app, where I used to do > all the ferret index stuff myself (callbacks, queries, ..), to the > mucho simpler interface offered by acts_as_ferret. AFAIK, aaf only > works with string queries, so if there is a string-based solution that > works, I'd rather use this one. If there is none, I'll bite the bullet > and follow your advice. "A man's got to do, what a man's got to do" Alain, i'm no AAF expert, but afaik most methods of aaf are using find_id_by_contents.. which is accepting a query parameter.. that query parameter gets forwarded to ferrets own search_each. search_each accepts strings or ferret query objects.. so i would assume that you can pass a OO-Query to AAF. And anything else would have surprised me.. as Jens code is quite sophisticated :-) Ben From bk at benjaminkrause.com Mon May 21 15:14:14 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Mon, 21 May 2007 21:14:14 +0200 Subject: [Ferret-talk] Sorting by score In-Reply-To: <095365fa28dd5beedf100b3433b2938b@ruby-forum.com> References: <85bfb43ed70f65124de978016e63e3a8@ruby-forum.com> <799709933e69b5fa631c8cb42d0b941e@ruby-forum.com> <20070518180406.GC31786@cordoba.webit.de> <20070521090037.GA31602@cordoba.webit.de> <095365fa28dd5beedf100b3433b2938b@ruby-forum.com> Message-ID: <3FADF41E-702C-4D9D-A55A-37B5489193AC@benjaminkrause.com> Hey .. try SCORE_REV instead of SCORE [1] .. :-) Ben [1] http://ferret.davebalmain.com/api/classes/Ferret/Search/ SortField.html From druse.jon at gmail.com Mon May 21 15:24:47 2007 From: druse.jon at gmail.com (Jon Druse) Date: Mon, 21 May 2007 21:24:47 +0200 Subject: [Ferret-talk] Sorting by score In-Reply-To: <3FADF41E-702C-4D9D-A55A-37B5489193AC@benjaminkrause.com> References: <85bfb43ed70f65124de978016e63e3a8@ruby-forum.com> <799709933e69b5fa631c8cb42d0b941e@ruby-forum.com> <20070518180406.GC31786@cordoba.webit.de> <20070521090037.GA31602@cordoba.webit.de> <095365fa28dd5beedf100b3433b2938b@ruby-forum.com> <3FADF41E-702C-4D9D-A55A-37B5489193AC@benjaminkrause.com> Message-ID: <20f8dd8c6676ee32d147c69b632c80f0@ruby-forum.com> Benjamin Krause wrote: > Hey .. > > try SCORE_REV instead of SCORE [1] .. :-) > > Ben > > [1] http://ferret.davebalmain.com/api/classes/Ferret/Search/ > SortField.html I tried that and seemed not to have any effect. does Ferret::Search::SortField::SCORE reference the same thing as result.ferret_score ? -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Mon May 21 15:40:07 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Mon, 21 May 2007 21:40:07 +0200 Subject: [Ferret-talk] Sorting by score In-Reply-To: <20f8dd8c6676ee32d147c69b632c80f0@ruby-forum.com> References: <85bfb43ed70f65124de978016e63e3a8@ruby-forum.com> <799709933e69b5fa631c8cb42d0b941e@ruby-forum.com> <20070518180406.GC31786@cordoba.webit.de> <20070521090037.GA31602@cordoba.webit.de> <095365fa28dd5beedf100b3433b2938b@ruby-forum.com> <3FADF41E-702C-4D9D-A55A-37B5489193AC@benjaminkrause.com> <20f8dd8c6676ee32d147c69b632c80f0@ruby-forum.com> Message-ID: <97055A7A-3798-4CDD-9A61-9F66D5808397@benjaminkrause.com> On May 21, 2007, at 21:24, Jon Druse wrote: >> try SCORE_REV instead of SCORE [1] .. :-) >> >> Ben >> >> [1] http://ferret.davebalmain.com/api/classes/Ferret/Search/ >> SortField.html > > I tried that and seemed not to have any effect. does > Ferret::Search::SortField::SCORE reference the same thing as > result.ferret_score ? that might be a bug.. ? i never tried SCORE_REV ... did you try SortField.new(:score, :type => float, :reverse => true) ? the score-sorting and the ferret_score should be the same .. Ben From druse.jon at gmail.com Mon May 21 16:00:28 2007 From: druse.jon at gmail.com (Jon Druse) Date: Mon, 21 May 2007 22:00:28 +0200 Subject: [Ferret-talk] Sorting by score In-Reply-To: <97055A7A-3798-4CDD-9A61-9F66D5808397@benjaminkrause.com> References: <85bfb43ed70f65124de978016e63e3a8@ruby-forum.com> <799709933e69b5fa631c8cb42d0b941e@ruby-forum.com> <20070518180406.GC31786@cordoba.webit.de> <20070521090037.GA31602@cordoba.webit.de> <095365fa28dd5beedf100b3433b2938b@ruby-forum.com> <3FADF41E-702C-4D9D-A55A-37B5489193AC@benjaminkrause.com> <20f8dd8c6676ee32d147c69b632c80f0@ruby-forum.com> <97055A7A-3798-4CDD-9A61-9F66D5808397@benjaminkrause.com> Message-ID: <2c8e7c806c451e982e35cd15c5eb0c74@ruby-forum.com> Benjamin Krause wrote: > > did you try SortField.new(:score, :type => float, :reverse => true) ? > > the score-sorting and the ferret_score should be the same .. > > Ben that produces an error, first for the 'float' portion, then it says Cannot sort by field "score". It doesn't exist in the index. maybe i'm missing something in the Model.. but if i just put result.ferret_score, that works just fine.. thanks for your help :-) jon -- Posted via http://www.ruby-forum.com/. From tennisbum2002 at hotmail.com Mon May 21 22:18:05 2007 From: tennisbum2002 at hotmail.com (Aryk Grosz) Date: Tue, 22 May 2007 04:18:05 +0200 Subject: [Ferret-talk] return ONLY total_hits without querying from real databa In-Reply-To: <20070521095254.GB31602@cordoba.webit.de> References: <23e01a45f233c1f2b9045db77ae9f747@ruby-forum.com> <20070516113409.GD10018@cordoba.webit.de> <58b3e3e69884804d862256880bb0eac3@ruby-forum.com> <20070518175041.GA31786@cordoba.webit.de> <20070521095254.GB31602@cordoba.webit.de> Message-ID: Hey Jens, I upgraded to the latest version. Everything works great, it even queries the results more efficiently. Great job! I don't see any method for total_hits for multi_search. The model.total_hits works fine when you are just searching on one model. Any idea how to do only total_hits with multi_search? Also, I added a little fix to multi_search to allow users to include associations even if it works on one model but not the other. Here is what the code looks like. It currently works great on one-level include and somewhat hacky on two-level associations. Maybe you can find some use for it? http://pastie.caboo.se/63435 Aryk -- Posted via http://www.ruby-forum.com/. From stuart.hungerford at anu.edu.au Mon May 21 22:11:36 2007 From: stuart.hungerford at anu.edu.au (Stuart Hungerford) Date: Tue, 22 May 2007 12:11:36 +1000 Subject: [Ferret-talk] acts as ferret, make index in /tmp then copy back to avoid NFS slowness? Message-ID: <46525158.3060803@anu.edu.au> Hi, We're happily using Ferret through acts_as_ferret on a Rails website, but we have an issue with rebuilding indexes. The Rails project directories are on an NFS mounted volume and running Model.rebuld_index there seems to take 60-100 times longer than it does on a local filesystem (e.g. /tmp). What we'd like to do is rebuild our indexes in say /tmp, then copy them back when done to the NFS volume with the Rails directories. Is there any simple way to get Model.rebuild_index to build indexes in a different place, or do we need to take another approach? Thanks in advance, Stu -- Stuart Hungerford ANUSF Data Intensive Projects From kraemer at webit.de Tue May 22 05:28:01 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 22 May 2007 11:28:01 +0200 Subject: [Ferret-talk] return ONLY total_hits without querying from real databa In-Reply-To: References: <23e01a45f233c1f2b9045db77ae9f747@ruby-forum.com> <20070516113409.GD10018@cordoba.webit.de> <58b3e3e69884804d862256880bb0eac3@ruby-forum.com> <20070518175041.GA31786@cordoba.webit.de> <20070521095254.GB31602@cordoba.webit.de> Message-ID: <20070522092801.GE31602@cordoba.webit.de> On Tue, May 22, 2007 at 04:18:05AM +0200, Aryk Grosz wrote: > Hey Jens, > > I upgraded to the latest version. Everything works great, it even > queries the results more efficiently. Great job! thanks :) > I don't see any method for total_hits for multi_search. The > model.total_hits works fine when you are just searching on one model. > > Any idea how to do only total_hits with multi_search? That was indeed missing. I just committed a fix for this to trunk. now you can give total_hits additional model classes with the :models option: A.total_hits(query, :models => [B, C]) > Also, I added a little fix to multi_search to allow users to include > associations even if it works on one model but not the other. Here is > what the code looks like. It currently works great on one-level include > and somewhat hacky on two-level associations. Maybe you can find some > use for it? > > http://pastie.caboo.se/63435 thanks, I integrated this :) Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Tue May 22 05:33:36 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 22 May 2007 11:33:36 +0200 Subject: [Ferret-talk] acts as ferret, make index in /tmp then copy back to avoid NFS slowness? In-Reply-To: <46525158.3060803@anu.edu.au> References: <46525158.3060803@anu.edu.au> Message-ID: <20070522093336.GF31602@cordoba.webit.de> On Tue, May 22, 2007 at 12:11:36PM +1000, Stuart Hungerford wrote: > Hi, > > We're happily using Ferret through acts_as_ferret on a Rails website, > but we have an issue with rebuilding indexes. > > The Rails project directories are on an NFS mounted volume and > running Model.rebuld_index there seems to take 60-100 times longer > than it does on a local filesystem (e.g. /tmp). > > What we'd like to do is rebuild our indexes in say /tmp, then copy > them back when done to the NFS volume with the Rails directories. > > Is there any simple way to get Model.rebuild_index to build indexes > in a different place, or do we need to take another approach? If you use the DRb server it would be relatively easy since that (at least in svn trunk) does index rebuilds in a separate directory independent from the current index that is in use for searching. By default the rebuild directory is named RAILS_ROOT/index/class_name/rebuild should be relatively easy to hack or even make this a configuration option. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From robert.head at comcast.net Tue May 22 09:39:52 2007 From: robert.head at comcast.net (Robert Head) Date: Tue, 22 May 2007 15:39:52 +0200 Subject: [Ferret-talk] Constant 0.11.4 Errors In-Reply-To: <2dae7e518520d8c900e857c2b97b4e35@ruby-forum.com> References: <6afb2b2d0ca571fb1836ad044fe13854@ruby-forum.com> <5d2e7c85e61024937428a281650070c2@ruby-forum.com> <76685bc50704100857l2ca72916m9372284d803cbfcf@mail.gmail.com> <846f30c70704111157q3a56f15aw96f68da682e066bb@mail.gmail.com> <846f30c70704121510w8cbd21ai86565bf9fc1d7050@mail.gmail.com> <20070413071258.GM16943@cordoba.webit.de> <846f30c70704161918k6fb45324m57a263a7278c34b7@mail.gmail.com> <7486fad0801d800ec91fdf0c44e48e16@ruby-forum.com> <291875d8acfb198df852c08cfc8dec35@ruby-forum.com> <2dae7e518520d8c900e857c2b97b4e35@ruby-forum.com> Message-ID: <4b0287932adec7ebff32a9f72c186ce9@ruby-forum.com> Vince W. wrote: > Hmm.. not wanting to make a mistake here so allow me to ask a silly > question. Do I just sudo gem install ferret, pick the 0.11.3 version, > and then erase the 0.11.4 gem? $ sudo gem uninstall ferret Select gem to uninstall: 1. ferret-0.11.3 2. ferret-0.11.4 3. All versions > 2 Successfully uninstalled ferret version 0.11.4 Remove executables and scripts for 'ferret-browser' in addition to the gem? [Yn] n Executables and scripts will remain installed. -- Posted via http://www.ruby-forum.com/. From michael.dershowitz at jpmchase.com Tue May 22 10:34:22 2007 From: michael.dershowitz at jpmchase.com (Mike Dershowitz) Date: Tue, 22 May 2007 16:34:22 +0200 Subject: [Ferret-talk] Ferret file not found error on item delete Message-ID: <3c2f26b061e470892d3a9734137526ec@ruby-forum.com> Hi ferreters: I've already posted this in rails, but someone there suggested this would be better. I'm getting a ferret error when I try to delete any item that has been previously indexed. I just installed ferret, so the indexes aren't big or anything. What I'm trying to do is to just delete an item on a table that has been index. Here's the error I'm getting: Processing GoalsController#destroy (for 68.83.170.192 at 2007-05-22 08:05:39) [DELETE] Session ID: ae7fa224cbeac580bac6fa4c9c250a03 Parameters: {"_method"=>"delete", "action"=>"destroy", "id"=>"105", "controller"=>"goals"} Ferret::FileNotFoundError (File Not Found Error occured at :117 in xpop_context Error occured in fs_store.c:329 - fs_open_input tried to open "/var/www/apps/goal_buddy/current/config/../index/production/goal/_2s_0.del" but it doesn't exist: ): /usr/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:285:in `delete' /usr/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:285:in `<<' /usr/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:8:in `synchrolock' /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' /usr/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:8:in `synchrolock' /usr/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:267:in `<<' /usr/lib/ruby/gems/1.8/gems/acts_as_ferret-0.4.0/lib/local_index.rb:140:in `<<' /usr/lib/ruby/gems/1.8/gems/acts_as_ferret-0.4.0/lib/instance_methods.rb:73:in `ferret_update' As I really don't know ferret and acts_as_ferret well, any help you could give would be greatly appreciated! Thanks! Mike -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Tue May 22 11:14:59 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Tue, 22 May 2007 17:14:59 +0200 Subject: [Ferret-talk] Ferret file not found error on item delete In-Reply-To: <3c2f26b061e470892d3a9734137526ec@ruby-forum.com> References: <3c2f26b061e470892d3a9734137526ec@ruby-forum.com> Message-ID: <1BE4BF28-1552-48D5-A904-971B8D81BE30@benjaminkrause.com> Hey .. > I'm getting a ferret error when I try to delete any item that has been > previously indexed. I just installed ferret, so the indexes aren't > big > or anything. What I'm trying to do is to just delete an item on a > table > that has been index. Here's the error I'm getting: > > Processing GoalsController#destroy (for 68.83.170.192 at 2007-05-22 > 08:05:39) [DELETE] > Session ID: ae7fa224cbeac580bac6fa4c9c250a03 > Parameters: {"_method"=>"delete", "action"=>"destroy", "id"=>"105", > "controller"=>"goals"} i'm not sure i understand what you want to do .. if you want to clear your index, you can use Index :create => true option [1] The error you are getting can have various reasons, like two writers writing the same index. Can you post a little more information about what you want to do and maybe the code that throws the exception? Ben [1] http://ferret.davebalmain.com/api/classes/Ferret/Index/ IndexWriter.html From michael.dershowitz at jpmchase.com Tue May 22 11:30:32 2007 From: michael.dershowitz at jpmchase.com (Mike Dershowitz) Date: Tue, 22 May 2007 17:30:32 +0200 Subject: [Ferret-talk] Ferret file not found error on item delete In-Reply-To: <1BE4BF28-1552-48D5-A904-971B8D81BE30@benjaminkrause.com> References: <3c2f26b061e470892d3a9734137526ec@ruby-forum.com> <1BE4BF28-1552-48D5-A904-971B8D81BE30@benjaminkrause.com> Message-ID: <3b5ceab1e2745bbf8b2cef9df5a736d4@ruby-forum.com> Hi Ben: Thanks so much for getting back to me. I don't know if/how to find out if two writers are looking to write the same index at the same time. What it appears is that the item that needs to be deleted "goal" is an indexed item, and thus ferret must do something when an indexed item wants to be deleted. That's when I get the error - it's almost as if ferret is doing some good cleanup but then can't find the file to cleanup. Does ferret/acts as ferret delete an associated index when the item is deleted? If so how do I make that delete code stronger? Better yet, does recreating indexes solve the problem? If so, I didn't/don't really understand how interact with ferret such that I could force it to recreate the index, if that would solve the problem, so direction there would be helpful as well. Thanks very much again, and in advance, for your help! Mike Benjamin Krause wrote: > Hey .. > >> Parameters: {"_method"=>"delete", "action"=>"destroy", "id"=>"105", >> "controller"=>"goals"} > > i'm not sure i understand what you want to do .. if you want to clear > your > index, you can use Index :create => true option [1] > > The error you are getting can have various reasons, like two writers > writing the same index. Can you post a little more information about > what you want to do and maybe the code that throws the exception? > > Ben > > > > [1] http://ferret.davebalmain.com/api/classes/Ferret/Index/ > IndexWriter.html -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue May 22 11:36:08 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 22 May 2007 17:36:08 +0200 Subject: [Ferret-talk] Ferret file not found error on item delete In-Reply-To: <3b5ceab1e2745bbf8b2cef9df5a736d4@ruby-forum.com> References: <3c2f26b061e470892d3a9734137526ec@ruby-forum.com> <1BE4BF28-1552-48D5-A904-971B8D81BE30@benjaminkrause.com> <3b5ceab1e2745bbf8b2cef9df5a736d4@ruby-forum.com> Message-ID: <20070522153608.GJ31602@cordoba.webit.de> On Tue, May 22, 2007 at 05:30:32PM +0200, Mike Dershowitz wrote: > Hi Ben: > > Thanks so much for getting back to me. I don't know if/how to find out > if two writers are looking to write the same index at the same time. > What it appears is that the item that needs to be deleted "goal" is an > indexed item, and thus ferret must do something when an indexed item > wants to be deleted. That's when I get the error - it's almost as if > ferret is doing some good cleanup but then can't find the file to > cleanup. Does ferret/acts as ferret delete an associated index when the > item is deleted? If so how do I make that delete code stronger? Yes, acts_as_ferret is all about keeping your DB and the Ferret index in sync. > Better yet, does recreating indexes solve the problem? If so, I > didn't/don't really understand how interact with ferret such that I > could force it to recreate the index, if that would solve the problem, > so direction there would be helpful as well. you can rebuild acts_as_ferret's index from the Rails console with Goal.rebuild_index And yes, rebuilding the index should solve the problem. If this happened to you in production mode, be sure to check out acts_as_ferret's DRb server which will prevent problems like this. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From michael.dershowitz at jpmchase.com Tue May 22 13:11:50 2007 From: michael.dershowitz at jpmchase.com (Mike Dershowitz) Date: Tue, 22 May 2007 19:11:50 +0200 Subject: [Ferret-talk] Ferret file not found error on item delete In-Reply-To: <20070522153608.GJ31602@cordoba.webit.de> References: <3c2f26b061e470892d3a9734137526ec@ruby-forum.com> <1BE4BF28-1552-48D5-A904-971B8D81BE30@benjaminkrause.com> <3b5ceab1e2745bbf8b2cef9df5a736d4@ruby-forum.com> <20070522153608.GJ31602@cordoba.webit.de> Message-ID: Ok, well, thanks for that information. So, I added the rebuild index to delete method for a goal, and now I get an error when I try and create a goal - the same error! What's the proper way to deal with ferret when I delete or create records - so far it's not been very helpful at all when the index changes. I also tried to execute an index rebuild from my command line and that didn't make any change. Do I need to stop/start the server? Thanks again for your help! Mike -- Posted via http://www.ruby-forum.com/. From druse.jon at gmail.com Tue May 22 13:49:15 2007 From: druse.jon at gmail.com (Jon Druse) Date: Tue, 22 May 2007 19:49:15 +0200 Subject: [Ferret-talk] Bug in Ferret::Search::SortField::SCORE ?? Message-ID: i have been trying to get this to work for a while now. my controller is sort = [ Ferret::Search::SortField::SCORE_REV ] @results = Record.multi_search(params[:search_terms], [ Link, Post, Event ], {:limit => :all, :sort => sort }) and in my view i just render a conglomeration of the appropriate partials for each model. it seems that no matter what i do, i can't get the results to be ordered by their ferret_score, even though i can display that score just fine in my views. i'm really confused and getting frustrated. maybe someone can shed some light on it. thanks! here is a sample model : acts_as_ferret :store_class_name => true, :fields => [:title, :body] and the main view : <% for r in @results %> <%= r.ferret_score %>

    <%= render_partial Inflector.underscore(r.class), r %>

    <% end %> Jon -- Posted via http://www.ruby-forum.com/. From michael.dershowitz at jpmchase.com Tue May 22 19:51:30 2007 From: michael.dershowitz at jpmchase.com (Mike Dershowitz) Date: Wed, 23 May 2007 01:51:30 +0200 Subject: [Ferret-talk] Ferret file not found error on item delete In-Reply-To: References: <3c2f26b061e470892d3a9734137526ec@ruby-forum.com> <1BE4BF28-1552-48D5-A904-971B8D81BE30@benjaminkrause.com> <3b5ceab1e2745bbf8b2cef9df5a736d4@ruby-forum.com> <20070522153608.GJ31602@cordoba.webit.de> Message-ID: <0c97ba938ac3c1a89654ad062a5547a7@ruby-forum.com> Any additional help here? Thanks in advance! -- Posted via http://www.ruby-forum.com/. From arthur.chui at gmail.com Tue May 22 20:00:57 2007 From: arthur.chui at gmail.com (Arthur Chui) Date: Wed, 23 May 2007 02:00:57 +0200 Subject: [Ferret-talk] Memory Leak When Searching For Multilingual Keyword(s) Message-ID: <5f1ea5a0dc89486e72be3edf5aa3a4e3@ruby-forum.com> On Windows XP, I played the AAF demo (svn://projects.jkraemer.net/acts_as_ferret/trunk/demo) that works nicely with English content. However, if the keyword is non-English (no matter whether there is any content in db), the server immediately causes memory leak over 1GB and stops responding. The languages I tried include: French (utf-8) German (utf-8) Spanish (utf-8) Chinese (utf-8) Japanese (utf-8) Seems no existing online article / trick would solve the issue. I had already tried all the tricks: 1. MultiLingualFerretTools plugin 2. set in environment.rb: $KCODE = 'u' require 'jcode' ENV['LANG'] = 'de_DE.UTF-8 at euro' ENV['LC_TIME'] = 'C' -- Posted via http://www.ruby-forum.com/. From parra06 at gmail.com Tue May 22 22:25:12 2007 From: parra06 at gmail.com (Marcello parra) Date: Wed, 23 May 2007 04:25:12 +0200 Subject: [Ferret-talk] Accented characters Message-ID: Hello, I want to clean up accented characters in my index, using acts_as_ferret in a Rails project. I searched this forum, and found the best solution is to use an analyser. I created somthing like this: class PortugueseAnalyzer include Ferret::Analysis MAPPING = { ['?','?','?','?','?','?','?','?'] => 'a', '?' => 'ae', ['?','?'] => 'd', ['?','?','?','?','?'] => 'c', ['?','?','?','?','?','?','?','?','?',] => 'e', ['?'] => 'f', ['?','?','?','?'] => 'g', ['?','?'] => 'h', ['?','?','?','?','?','?','?','?'] => 'i', ['?','?','?','?'] => 'j', ['?','?'] => 'k', ['?','?','?','?','?'] => 'l', ['?','?','?','?','?','?'] => 'n', ['?','?','?','?','?','?','?','?','?','?'] => 'o', ['?'] => 'oek', ['?'] => 'q', ['?','?','?'] => 'r', ['?','?','?','?','?'] => 's', ['?','?','?','?'] => 't', ['?','?','?','?','?','?','?','?','?','?'] => 'u', ['?'] => 'w', ['?','?','?'] => 'y', ['?','?','?'] => 'z' } def token_stream(field, string) return MappingFilter.new(StandardTokenizer.new(string), MAPPING) end end And inserted this code at the end of environment.rb. Im my model: acts_as_ferret({ :fields => [ 'name' ] }, :analyzer => PortugueseAnalyzer.new) But this did not work.... Can someone tell me what I did wrong ???? Thanks Marcello -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed May 23 03:38:34 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 23 May 2007 09:38:34 +0200 Subject: [Ferret-talk] Ferret file not found error on item delete In-Reply-To: References: <3c2f26b061e470892d3a9734137526ec@ruby-forum.com> <1BE4BF28-1552-48D5-A904-971B8D81BE30@benjaminkrause.com> <3b5ceab1e2745bbf8b2cef9df5a736d4@ruby-forum.com> <20070522153608.GJ31602@cordoba.webit.de> Message-ID: <20070523073834.GK31602@cordoba.webit.de> On Tue, May 22, 2007 at 07:11:50PM +0200, Mike Dershowitz wrote: > Ok, well, thanks for that information. So, I added the rebuild index to > delete method for a goal, and now I get an error when I try and create a > goal - the same error! What's the proper way to deal with ferret when I > delete or create records - so far it's not been very helpful at all when > the index changes. You got me wrong here - you definitely should not rebuild the index every time a goal is deleted. > I also tried to execute an index rebuild from my command line and that > didn't make any change. Do I need to stop/start the server? Is this on your development machine or in a production setup? Usually there is no need to call rebuild_index manually in development mode, just stop the server, delete the index directory and start the server again. In production mode, use the DRb server and use rebuild_index via the rails console if needed. If this still doesn't work out we would need some more information (logs) to see what's going on. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Wed May 23 03:52:49 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 23 May 2007 09:52:49 +0200 Subject: [Ferret-talk] Accented characters In-Reply-To: References: Message-ID: <20070523075249.GL31602@cordoba.webit.de> On Wed, May 23, 2007 at 04:25:12AM +0200, Marcello parra wrote: > Hello, > > I want to clean up accented characters in my index, using acts_as_ferret > in a Rails project. I searched this forum, and found the best solution > is to use an analyser. > I created somthing like this: > > class PortugueseAnalyzer Try inheriting your analyzer from Ferret::Analysis::Analyzer. Does not seem to be necessary API-wise, but imho this should help. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Wed May 23 03:57:05 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 23 May 2007 09:57:05 +0200 Subject: [Ferret-talk] Memory Leak When Searching For Multilingual Keyword(s) In-Reply-To: <5f1ea5a0dc89486e72be3edf5aa3a4e3@ruby-forum.com> References: <5f1ea5a0dc89486e72be3edf5aa3a4e3@ruby-forum.com> Message-ID: <20070523075705.GM31602@cordoba.webit.de> On Wed, May 23, 2007 at 02:00:57AM +0200, Arthur Chui wrote: > On Windows XP, I played the AAF demo > (svn://projects.jkraemer.net/acts_as_ferret/trunk/demo) that works > nicely with English content. However, if the keyword is non-English (no > matter whether there is any content in db), the server immediately > causes memory leak over 1GB and stops responding. The languages I tried > include: Afair this is a win32 related problem, I never saw this behaviour on Linux. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Wed May 23 04:40:26 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 23 May 2007 10:40:26 +0200 Subject: [Ferret-talk] Bug in Ferret::Search::SortField::SCORE ?? In-Reply-To: References: Message-ID: <20070523084026.GA18062@cordoba.webit.de> On Tue, May 22, 2007 at 07:49:15PM +0200, Jon Druse wrote: > i have been trying to get this to work for a while now. my controller > is > > sort = [ Ferret::Search::SortField::SCORE_REV ] > @results = Record.multi_search(params[:search_terms], [ Link, Post, > Event ], {:limit => :all, :sort => sort }) > > and in my view i just render a conglomeration of the appropriate > partials for each model. it seems that no matter what i do, i can't get > the results to be ordered by their ferret_score, even though i can > display that score just fine in my views. i'm really confused and > getting frustrated. maybe someone can shed some light on it. thanks! What version of aaf/Ferret do you use? I just added some more tests to aaf trunk and it seems to work ok. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From parra06 at gmail.com Wed May 23 05:42:04 2007 From: parra06 at gmail.com (Marcello parra) Date: Wed, 23 May 2007 11:42:04 +0200 Subject: [Ferret-talk] Accented characters In-Reply-To: <20070523075249.GL31602@cordoba.webit.de> References: <20070523075249.GL31602@cordoba.webit.de> Message-ID: <2c2f71f78a646da047a159833f1441b1@ruby-forum.com> > Try inheriting your analyzer from Ferret::Analysis::Analyzer. Does not > seem to be necessary API-wise, but imho this should help. > > Jens > Thanks Jens. I changed from "class PortugueseAnalyzer" to "class PortugueseAnalyzer < Ferret::Analysis::Analyzer", but did not work also.... Did I put this in the right place ?? Thanks Marcello -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed May 23 05:46:05 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 23 May 2007 11:46:05 +0200 Subject: [Ferret-talk] Accented characters In-Reply-To: <2c2f71f78a646da047a159833f1441b1@ruby-forum.com> References: <20070523075249.GL31602@cordoba.webit.de> <2c2f71f78a646da047a159833f1441b1@ruby-forum.com> Message-ID: <20070523094605.GA20147@cordoba.webit.de> On Wed, May 23, 2007 at 11:42:04AM +0200, Marcello parra wrote: > > Try inheriting your analyzer from Ferret::Analysis::Analyzer. Does not > > seem to be necessary API-wise, but imho this should help. > > > > Jens > > > > > Thanks Jens. > I changed from "class PortugueseAnalyzer" to > "class PortugueseAnalyzer < Ferret::Analysis::Analyzer", > but did not work also.... > > Did I put this in the right place ?? I think so. To help debugging this a small ruby skript reproducing the exact problem would be cool. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From parra06 at gmail.com Wed May 23 06:21:29 2007 From: parra06 at gmail.com (Marcello parra) Date: Wed, 23 May 2007 12:21:29 +0200 Subject: [Ferret-talk] Accented characters In-Reply-To: <20070523094605.GA20147@cordoba.webit.de> References: <20070523075249.GL31602@cordoba.webit.de> <2c2f71f78a646da047a159833f1441b1@ruby-forum.com> <20070523094605.GA20147@cordoba.webit.de> Message-ID: > I think so. To help debugging this a small ruby skript reproducing the > exact problem would be cool. > Jens, In the log, I get: creating doc for class: Conta, id: 164 Adding field name with value 'Jos?? Antonio' to index So, the name is not being traslated from UTF to ascii.... It's the same output if I did not use the Analyzer. Thanks -- Posted via http://www.ruby-forum.com/. From parra06 at gmail.com Wed May 23 06:43:21 2007 From: parra06 at gmail.com (Marcello parra) Date: Wed, 23 May 2007 12:43:21 +0200 Subject: [Ferret-talk] Accented characters In-Reply-To: References: <20070523075249.GL31602@cordoba.webit.de> <2c2f71f78a646da047a159833f1441b1@ruby-forum.com> <20070523094605.GA20147@cordoba.webit.de> Message-ID: > In the log, I get: > > creating doc for class: Conta, id: 164 > Adding field name with value 'Jos?? Antonio' to index I included a word preju?zo... that should be translated to prejuizo... I put some code to output information when it builds the index. This is what a get: Analyzing: field:nome str:preju??zo token["preju":0:5:1] token["zo":7:9:1] So, the problem is that it breaks the word in two, just in the accented character... A guess the problem is in: def token_stream(field, string) return MappingFilter.new(StandardTokenizer.new(string), MAPPING) end But, I can't figure how..... -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed May 23 07:37:53 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 23 May 2007 13:37:53 +0200 Subject: [Ferret-talk] Accented characters In-Reply-To: References: <20070523075249.GL31602@cordoba.webit.de> <2c2f71f78a646da047a159833f1441b1@ruby-forum.com> <20070523094605.GA20147@cordoba.webit.de> Message-ID: <20070523113752.GC20147@cordoba.webit.de> On Wed, May 23, 2007 at 12:43:21PM +0200, Marcello parra wrote: > > In the log, I get: > > > > creating doc for class: Conta, id: 164 > > Adding field name with value 'Jos?? Antonio' to index > > > I included a word preju?zo... that should be translated to prejuizo... > I put some code to output information when it builds the index. This is > what a get: > > Analyzing: field:nome str:preju??zo > token["preju":0:5:1] > token["zo":7:9:1] With the script at http://pastie.caboo.se/63808 I get: token["prejuizo":0:9:1] It seems that Ferret doesn't recognize the ? as a character and therefore splits the word at this position. You have to make sure that everything in your environment is using UTF-8 as character encoding for these things to work (expecially locale settings are relevant to ferret) Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From info at ifhere.org Wed May 23 14:36:00 2007 From: info at ifhere.org (j. weir) Date: Wed, 23 May 2007 20:36:00 +0200 Subject: [Ferret-talk] Constant 0.11.4 Errors In-Reply-To: <6afb2b2d0ca571fb1836ad044fe13854@ruby-forum.com> References: <6afb2b2d0ca571fb1836ad044fe13854@ruby-forum.com> Message-ID: I am getting the same error using Ubuntu 6.10 and acts_as_ferret. I can reindex just fine, but when I resave some records I get the fs_store, File Not Found Error. downgrading to 11.3 seems to be working. > 1) Error: > test_published_blog_can_be_ferreted(BlogTest): > Ferret::FileNotFoundError: File Not Found Error occured at > :117 in xpop_context > Error occured in fs_store.c:329 - fs_open_input > tried to open > "/Users/mark/Sites/www.site.com/config/../index/test/blog/_17_0.del" but > it doesn't exist: > -- Posted via http://www.ruby-forum.com/. From patcito at gmail.com Wed May 23 16:07:29 2007 From: patcito at gmail.com (Patrick Aljord) Date: Wed, 23 May 2007 22:07:29 +0200 Subject: [Ferret-talk] issues with searching custom fields Message-ID: <6b6419750705231307s4abe3ce7je77884f1a7a5bde8@mail.gmail.com> Hey all, I'm using acts_as_ferret this way: class Job < ActiveRecord::Base acts_as_ferret :fields => [:title, :workers_name] def workers_name return self.workers.inject("") {|names,b| names + " " + b.first_name}.to_s end end But when I do Job.find_by_contents("workers_name:patrick") I get nil. Yet when I do: j=Job.find :first j.workers_name I do get a worker (among others) which name is patrick. Any idea why I don't get it with find_by_contents? thanx in advance Pat From solaris at sundevil.de Thu May 24 03:31:59 2007 From: solaris at sundevil.de (Hendrik Volkmer) Date: Thu, 24 May 2007 09:31:59 +0200 Subject: [Ferret-talk] Strange Problem with AAF DRB connection Message-ID: <86e407d8ae1513562c40a81bc43fb626@ruby-forum.com> Hi all! We use the DRB-Server Backend and are getting strange DRb::DRbConnErrors lately. It started with: too large packet 687865856 (druby:/10.0.0.10:9010) /usr/lib/ruby/1.8/drb/drb.rb:573:in `load' and later only this one: premature marshal format(can't read) (druby:/10.0.0.10:9010) /usr/lib/ruby/1.8/drb/drb.rb:580:in `load' Do you have any ideas what that could be? We didn't change so much regarding aaf. Maybe we put some more fields in the index, that should be it. We're using the following versions of ferret, aaf, and rails: ferret - 0.11.3 (0.11.4 resulted in other, strange problems) aaf - 0.4 Rails - 1.2.3 Cheers, Hendrik -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu May 24 05:39:57 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 24 May 2007 11:39:57 +0200 Subject: [Ferret-talk] Strange Problem with AAF DRB connection In-Reply-To: <86e407d8ae1513562c40a81bc43fb626@ruby-forum.com> References: <86e407d8ae1513562c40a81bc43fb626@ruby-forum.com> Message-ID: <20070524093957.GB8909@cordoba.webit.de> On Thu, May 24, 2007 at 09:31:59AM +0200, Hendrik Volkmer wrote: > Hi all! > > We use the DRB-Server Backend and are getting strange DRb::DRbConnErrors > lately. It started with: > > too large packet 687865856 > (druby:/10.0.0.10:9010) /usr/lib/ruby/1.8/drb/drb.rb:573:in `load' > > and later only this one: > > premature marshal format(can't read) > (druby:/10.0.0.10:9010) /usr/lib/ruby/1.8/drb/drb.rb:580:in `load' > > Do you have any ideas what that could be? We didn't change so much > regarding aaf. Maybe we put some more fields in the index, that should > be it. strange - the first one seems to be a request way to large, and in the second case the request payload has been shorter than expected - are you sure you don't have any network issues? Any hints on when this happens (i.e. high load, special actions) ? Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Thu May 24 05:40:50 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 24 May 2007 11:40:50 +0200 Subject: [Ferret-talk] issues with searching custom fields In-Reply-To: <6b6419750705231307s4abe3ce7je77884f1a7a5bde8@mail.gmail.com> References: <6b6419750705231307s4abe3ce7je77884f1a7a5bde8@mail.gmail.com> Message-ID: <20070524094050.GC8909@cordoba.webit.de> On Wed, May 23, 2007 at 10:07:29PM +0200, Patrick Aljord wrote: > Hey all, > > I'm using acts_as_ferret this way: > class Job < ActiveRecord::Base > acts_as_ferret :fields => [:title, :workers_name] > > def workers_name > return self.workers.inject("") {|names,b| names + " " + b.first_name}.to_s > end > end > > But when I do Job.find_by_contents("workers_name:patrick") > I get nil. > > Yet when I do: > j=Job.find :first > j.workers_name > I do get a worker (among others) which name is patrick. > > Any idea why I don't get it with find_by_contents? have you tried rebuilding your index? You have to do so after changing aaf options. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From patcito at gmail.com Thu May 24 09:35:09 2007 From: patcito at gmail.com (Patrick Aljord) Date: Thu, 24 May 2007 15:35:09 +0200 Subject: [Ferret-talk] issues with searching custom fields In-Reply-To: <20070524094050.GC8909@cordoba.webit.de> References: <6b6419750705231307s4abe3ce7je77884f1a7a5bde8@mail.gmail.com> <20070524094050.GC8909@cordoba.webit.de> Message-ID: <6b6419750705240635k67b60219ob5fc174f586f091e@mail.gmail.com> On 5/24/07, Jens Kraemer wrote: > have you tried rebuilding your index? You have to do so after changing > aaf options. yes indeed I did rm -rf index/ yesterday and it worked :) From giuseppe.bertini at gmail.com Thu May 24 11:33:48 2007 From: giuseppe.bertini at gmail.com (Giuseppe Bertini) Date: Thu, 24 May 2007 17:33:48 +0200 Subject: [Ferret-talk] Search scoping in acts_as_ferret Message-ID: Hello, I am exploring acts_as_ferret, and the first question that pops to mind is scoping, i.e. how to restrict searches in various ways. For example, I have a Post model with title, content, and user_id as attributes, and I want users to be able to search through their own posts only. Normally, of course, I would do Post.find(:all, :conditions=>["user_id=?", current_user], :joins=>"somejoins") What about ferret-ized models? Is there a clean way to impose such restrictions to the ferret search itself, or should one search the entire model first, and then use the returned ids to run a regular query with a "in" clause? Apologies if the answer to this is trivial, I cannot seem to google the relevant info. Cheers, Giuseppe -- Posted via http://www.ruby-forum.com/. From doug.arogos at gmail.com Thu May 24 11:56:44 2007 From: doug.arogos at gmail.com (Doug Smith) Date: Thu, 24 May 2007 08:56:44 -0700 Subject: [Ferret-talk] Search scoping in acts_as_ferret In-Reply-To: References: Message-ID: <42d8808f0705240856l31837a77rffea21a5d8810401@mail.gmail.com> Just add something like this to your query string: "user_id:10" I typically use a drop-down on the view to let the user choose limiting values like this, then I concatenate the field values in the query string before calling find_by_contents or whatever. You can find out a lot more about other query related options in the Lucene documentation too: http://lucene.apache.org/java/docs/queryparsersyntax.html Thanks, Doug On 5/24/07, Giuseppe Bertini wrote: > > Hello, > I am exploring acts_as_ferret, and the first question that pops to mind > is scoping, i.e. how to restrict searches in various ways. > > For example, I have a Post model with title, content, and user_id as > attributes, and I want users to be able to search through their own > posts only. > > Normally, of course, I would do > Post.find(:all, :conditions=>["user_id=?", current_user], > :joins=>"somejoins") > > What about ferret-ized models? > Is there a clean way to impose such restrictions to the ferret search > itself, or should one search the entire model first, and then use the > returned ids to run a regular query with a "in" clause? > > Apologies if the answer to this is trivial, I cannot seem to google the > relevant info. > > Cheers, > Giuseppe > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070524/18939f80/attachment.html From rages at shaw.ca Thu May 24 14:30:04 2007 From: rages at shaw.ca (Kyle) Date: Thu, 24 May 2007 20:30:04 +0200 Subject: [Ferret-talk] Constant 0.11.4 Errors In-Reply-To: References: <6afb2b2d0ca571fb1836ad044fe13854@ruby-forum.com> Message-ID: <5eeb7fdf445633125317421e31136fad@ruby-forum.com> I actually still get this error in 0.11.3, I have to revert to 0.10.14 to get it remotely working. Does anyone have any advice on what exactly the issue is? or how it can be worked around. I would love to be as close to the recent devel as possible =). j. weir wrote: > I am getting the same error using Ubuntu 6.10 and acts_as_ferret. > > I can reindex just fine, but when I resave some records I get the > fs_store, File Not Found Error. > > downgrading to 11.3 seems to be working. > > >> 1) Error: >> test_published_blog_can_be_ferreted(BlogTest): >> Ferret::FileNotFoundError: File Not Found Error occured at >> :117 in xpop_context >> Error occured in fs_store.c:329 - fs_open_input >> tried to open >> "/Users/mark/Sites/www.site.com/config/../index/test/blog/_17_0.del" but >> it doesn't exist: >> -- Posted via http://www.ruby-forum.com/. From patcito at gmail.com Thu May 24 22:27:35 2007 From: patcito at gmail.com (Patrick Aljord) Date: Fri, 25 May 2007 04:27:35 +0200 Subject: [Ferret-talk] how to update index with acts_as_ferret? Message-ID: <6b6419750705241927q6524881fm3a01e0c609bed459@mail.gmail.com> Hey all, I have movie has_many :medias and media belongs_to :media this is how my movie class looks like: class Movie < ActiveRecord::Base has_many :medias acts_as_ferret :fields => [:title,:medias_name] def medias_name return self.medias.inject("") {|name,m| name + " " + m.name} end end when I do Movie.find_by_contents("title:bob") it does return a movie and if I modify that movie title to Bill and do Movie.find_by_contents("title:bill") it will return the movie also. But if I create a new media xyz and try to search on it Movie.find_by_contents("medias_name:xyz") it doesn't find it, I need to rm -rf index/ and restart the server to make it find it. Any idea how I can update the index on special fields? thanx in advance Pat From kraemer at webit.de Sat May 26 05:56:44 2007 From: kraemer at webit.de (Jens Kraemer) Date: Sat, 26 May 2007 11:56:44 +0200 Subject: [Ferret-talk] how to update index with acts_as_ferret? In-Reply-To: <6b6419750705241927q6524881fm3a01e0c609bed459@mail.gmail.com> References: <6b6419750705241927q6524881fm3a01e0c609bed459@mail.gmail.com> Message-ID: <20070526095644.GB28949@cordoba.webit.de> Hi! On Fri, May 25, 2007 at 04:27:35AM +0200, Patrick Aljord wrote: > Hey all, > I have movie has_many :medias and media belongs_to :media > > this is how my movie class looks like: > > class Movie < ActiveRecord::Base > has_many :medias > > acts_as_ferret :fields => [:title,:medias_name] > > def medias_name > return self.medias.inject("") {|name,m| name + " " + m.name} > end > > end > > when I do Movie.find_by_contents("title:bob") it does return a movie > and if I modify that movie title to Bill and do > Movie.find_by_contents("title:bill") it will return the movie also. > > But if I create a new media xyz and try to search on it > Movie.find_by_contents("medias_name:xyz") it doesn't find it, I need > to rm -rf index/ and restart the server to make it find it. Any idea > how I can update the index on special fields? you can't update single fields, you havbe to reindex that movie to reflect the changes. An after_save hook in media calling movie.ferret_update should do the trick. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From jjm at codewell.com Mon May 28 11:16:04 2007 From: jjm at codewell.com (Jeff Mallatt) Date: Mon, 28 May 2007 11:16:04 -0400 Subject: [Ferret-talk] external highlighter/excerpter Message-ID: <7.0.1.0.2.20070528111406.03a30d78@codewell.com> Has anyone seen an implementation or example of an highlighter/excerpter that works on external (non-stored) fields? Thanks. From snowstorm+rubyforum at gmail.com Tue May 29 00:33:17 2007 From: snowstorm+rubyforum at gmail.com (Yaxm Yaxm) Date: Tue, 29 May 2007 06:33:17 +0200 Subject: [Ferret-talk] is "IN" a special word? Message-ID: Hi, I am trying to do a search for a field that contains the word "in" or "IN", but ferret doesn't return me any result. class User < ActiveRecord::Base acts_as_ferret :fields => { :user => {:store => :no }, :len => {:store => :yes} } end ruby script/console >> User.find_by_contents('Cal') => #"Cal Poly", "id"=>"1", "len"=>nil}>]> >> u = User.new => #nil, "len"=>nil}> >> u.user = 'IN' => "IN" >> u.save => true >> User.find_by_contents('IN') => # >> User.rebuild_index => {} >> User.find_by_contents('IN') => # >> u.user = 'in' => "in" >> u.save => true >> User.find_by_contents('\i\n') => # >> User.find_by_contents('in') => # so is "in" or "IN" a special word? What can I do to make them appear in my search result? Is there a list of all the special words? Thanks. Yaxm -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Tue May 29 04:23:46 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Tue, 29 May 2007 10:23:46 +0200 Subject: [Ferret-talk] is "IN" a special word? In-Reply-To: References: Message-ID: <20C3B2C6-B55F-4FB6-8AEB-9A82143D86AC@benjaminkrause.com> On May 29, 2007, at 06:33, Yaxm Yaxm wrote: > Hi, I am trying to do a search for a field that contains the word "in" > or "IN", but ferret doesn't return me any result. looks like it is a stop word: irb(main):003:0> Ferret::Analysis::FULL_ENGLISH_STOP_WORDS.include? 'in' => true stop words will not be indexed .. Ben From derek.haynes at gmail.com Tue May 29 11:58:14 2007 From: derek.haynes at gmail.com (Derek Haynes) Date: Tue, 29 May 2007 08:58:14 -0700 Subject: [Ferret-talk] index#term_docs returns no docs / term_docs_for does Message-ID: <8eee6840705290858l5f20f98cr3677a287e0e2ab7@mail.gmail.com> I'm building my first Filter and I'm running into an issue with Ferret::Index::IndexReader#term_docs. As I understand it, index_reader.term_docs should return a term-document enumerator for the entire index: index_reader.term_docs => empty set However, I'm getting an empty set. Interestingly enough, the following: index_reader.term_docs_for(:name, 'coffee') => lots of documents Is returning plenty of documents. I'm expecting #term_docs to return an enumerator that allows me to step through the entire index - any ideas on what I'm doing wrong? I'm using acts_as_ferret and I'm on version 0.10.13 of the gem on OSX. Thanks in advance for any assistance, Derek -- Derek Haynes Highgroove Studios - http://www.highgroove.com San Francisco, CA | Atlanta, GA Keeping it Simple. 404.593.4879 Slingshot - Ruby on Rails Business Hosting http://www.slingshothosting.com From tompata at gmail.com Tue May 29 13:02:49 2007 From: tompata at gmail.com (Tamas Tompa) Date: Tue, 29 May 2007 19:02:49 +0200 Subject: [Ferret-talk] Memory leak Windows XP SP2 related to search involving ' In-Reply-To: <815dc27deae7c18f984874ca1b7b5977@ruby-forum.com> References: <815dc27deae7c18f984874ca1b7b5977@ruby-forum.com> Message-ID: <422667c5c332d6333b50d1ee8aef1b7e@ruby-forum.com> Hello, I've the same problem, same platform: windows xp, same versions... There's a serious memory leak, while processing the unicode search! Is there any hope, that you will fix this problem under win xp? Can i help you? (i can send logs, any information), but it's very important! Please answer, Thank you, Tamas -- Posted via http://www.ruby-forum.com/. From john at digitalpulp.com Tue May 29 12:59:20 2007 From: john at digitalpulp.com (John Bachir) Date: Tue, 29 May 2007 12:59:20 -0400 Subject: [Ferret-talk] When does ferret / AAF decide to reindex? Message-ID: <8AAF7868-3BB8-4904-8680-64711B08D5EE@digitalpulp.com> I am experience situations where accessing the index results in a complete re-indexing of the model. I have not been able to detect a pattern. Under what circumstances does Ferret (or AAF) decide that it needs to rebuild the index? I'll be happy to look at the code relevant to this if someone could direct me to it. Thanks, John From plynchnlm at gmail.com Tue May 29 17:06:51 2007 From: plynchnlm at gmail.com (Paul Lynch) Date: Tue, 29 May 2007 17:06:51 -0400 Subject: [Ferret-talk] Query strings and stop words Message-ID: <50d6c72a0705291406u1d6b6d90tb029add4e162845e@mail.gmail.com> Is there an option for filtering stop words out of the query string, so that queries that contain stop words don't return zero results? It has been several years, but I think that when I was on a project that was writing a search engine, we used to filter stop words both out of the index and out of the query string, after it was parsed. This allows a query like [cold and sinus] to return hits even when the default operator is AND. If there is no option for this in ferret, is there some way of getting at ferret's parsed query structure, and deleting stop words before it tries to use it? I would like to avoid parsing the query myself, because that duplicates what ferret does. (At the very least, I would like to avoid writing a ferret query language parser!) Thanks, --Paul -- Paul Lynch Aquilent, Inc. National Library of Medicine (Contractor) From matt at expectedbehavior.com Tue May 29 23:09:11 2007 From: matt at expectedbehavior.com (Matt Gordon) Date: Tue, 29 May 2007 23:09:11 -0400 Subject: [Ferret-talk] When does ferret / AAF decide to reindex? In-Reply-To: <8AAF7868-3BB8-4904-8680-64711B08D5EE@digitalpulp.com> References: <8AAF7868-3BB8-4904-8680-64711B08D5EE@digitalpulp.com> Message-ID: <465CEAD7.5010000@expectedbehavior.com> I'm fairly certain that a complete re-index only occurs when either a) the index can't be found at the given 'index_dir', or b) Model.rebuild_index is called. The code I looked at was in ClassMethods#aaf_index which, if you follow it into 'create_index_instance', does index on creation. Hopefully that can at least put you on the track to debugging your problem. Matt John Bachir wrote: > I am experience situations where accessing the index results in a > complete re-indexing of the model. I have not been able to detect a > pattern. > > Under what circumstances does Ferret (or AAF) decide that it needs to > rebuild the index? I'll be happy to look at the code relevant to this > if someone could direct me to it. > > Thanks, > John > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From vivio at hotmail.com Wed May 30 06:46:07 2007 From: vivio at hotmail.com (Vivio) Date: Wed, 30 May 2007 12:46:07 +0200 Subject: [Ferret-talk] Ferret write lock persistent after model update Message-ID: <43d8c0042bc60ef42e6d1bb181132748@ruby-forum.com> Hiya, I got my Rails app with full text search using acts_as_ferret. But I'm experiencing trouble when updating my models, a ferret-write.lck file appears in the index dir and every subsequent search from the rails console hangs. If I update my model from the console, I get no problems at all. I'm running a single rails process in development mode. -- Posted via http://www.ruby-forum.com/. From ramon.pperez at gmail.com Wed May 30 10:33:27 2007 From: ramon.pperez at gmail.com (Ramon) Date: Wed, 30 May 2007 16:33:27 +0200 Subject: [Ferret-talk] How to search with limit by field Message-ID: <6330ef94ae27b584e409faf8031bded8@ruby-forum.com> Hello, I have a ferret index with 2 fields: Acts_as_ferret :fields => [:client, :content] If I do model.find_by_contents(query) I obtain all results by the query but I would like to obtain 3 results for each client. Any ideas? Thanks for all. -- Posted via http://www.ruby-forum.com/. From doug.arogos at gmail.com Wed May 30 12:45:28 2007 From: doug.arogos at gmail.com (Doug Smith) Date: Wed, 30 May 2007 09:45:28 -0700 Subject: [Ferret-talk] How to search with limit by field In-Reply-To: <6330ef94ae27b584e409faf8031bded8@ruby-forum.com> References: <6330ef94ae27b584e409faf8031bded8@ruby-forum.com> Message-ID: <42d8808f0705300945w359bf4d8ib7fddb8e69178682@mail.gmail.com> Hi Ramon, I think you'd have to do three different queries: query = params[:query] @results1 = model.find_by_contents("client:1 content:#{query}", {:limit => 3}) @results2 = model.find_by_contents("client:2 content:#{query}", {:limit => 3}) @results3 = model.find_by_contents("client:3 content:#{query}", {:limit => 3}) Ferret is fast enough that this shouldn't be a performance problem. Thanks, Doug On 5/30/07, Ramon wrote: > > Hello, > > I have a ferret index with 2 fields: > > Acts_as_ferret :fields => [:client, :content] > > If I do model.find_by_contents(query) I obtain all results by the query > but I would like to obtain 3 results for each client. > > Any ideas? > > Thanks for all. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070530/fa141f3a/attachment.html From ror at philippeapril.com Wed May 30 13:40:27 2007 From: ror at philippeapril.com (Philippe April) Date: Wed, 30 May 2007 13:40:27 -0400 Subject: [Ferret-talk] A way to get all the words from an index? Message-ID: Hi, I am just wondering if there's a way to get all the words from an index. Basically, all the words that have been indexed (excluding the stopwords if I'm using the stopwords analyzer, etc.) The fields I'm putting in are not :stored in the index. The idea is to implement a "did you mean?" mecanism, which is based on the content of the index, not on a dictionary... Possible? Thank you! Philippe April From john at digitalpulp.com Wed May 30 15:49:25 2007 From: john at digitalpulp.com (John Bachir) Date: Wed, 30 May 2007 15:49:25 -0400 Subject: [Ferret-talk] A way to get all the words from an index? In-Reply-To: References: Message-ID: <882CE2F2-7707-49FE-A7C2-E4E56BFD52C7@digitalpulp.com> On May 30, 2007, at 1:40 PM, Philippe April wrote: > I am just wondering if there's a way to get all the words from an > index. Basically, all the words that have been indexed (excluding the > stopwords if I'm using the stopwords analyzer, etc.) perhaps something like this: th_hash = {} Resource.aaf_index.ferret_index.reader.terms(:body).each {|t, f| term_hash[t] = f } th_sorted = term_hash.sort {|a,b| a[1]<=>b[1]}.reverse Cheers, John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070530/87e0ca45/attachment.html From john at digitalpulp.com Wed May 30 15:51:01 2007 From: john at digitalpulp.com (John Bachir) Date: Wed, 30 May 2007 15:51:01 -0400 Subject: [Ferret-talk] A way to get all the words from an index? In-Reply-To: <882CE2F2-7707-49FE-A7C2-E4E56BFD52C7@digitalpulp.com> References: <882CE2F2-7707-49FE-A7C2-E4E56BFD52C7@digitalpulp.com> Message-ID: <61271AA8-0137-4C69-88F1-1067F06BB85C@digitalpulp.com> On May 30, 2007, at 3:49 PM, John Bachir wrote: > Resource.aaf_index.ferret_index.reader.terms(:body).each {|t, f| > term_hash[t] = f } (Resource is the model being indexed) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070530/e07d64f5/attachment.html From ror at philippeapril.com Wed May 30 18:26:06 2007 From: ror at philippeapril.com (Philippe April) Date: Wed, 30 May 2007 18:26:06 -0400 Subject: [Ferret-talk] A way to get all the words from an index? In-Reply-To: <882CE2F2-7707-49FE-A7C2-E4E56BFD52C7@digitalpulp.com> References: <882CE2F2-7707-49FE-A7C2-E4E56BFD52C7@digitalpulp.com> Message-ID: <8DF063D6-BCD3-4BEF-8974-B228B5923DFC@philippeapril.com> John, This is exactly what I've been looking for... I guess I didn't know about the reader! Thank you, Philippe On 30-May-07, at 3:49 PM, John Bachir wrote: > On May 30, 2007, at 1:40 PM, Philippe April wrote: >> I am just wondering if there's a way to get all the words from an >> index. Basically, all the words that have been indexed (excluding the >> stopwords if I'm using the stopwords analyzer, etc.) > > > perhaps something like this: > > th_hash = {} > Resource.aaf_index.ferret_index.reader.terms(:body).each {|t, f| > term_hash[t] = f } > th_sorted = term_hash.sort {|a,b| a[1]<=>b[1]}.reverse > > Cheers, > John > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070530/627a7771/attachment-0001.html From yura at brainhouse.ru Wed May 30 18:37:45 2007 From: yura at brainhouse.ru (Yury Kotlyarov) Date: Thu, 31 May 2007 00:37:45 +0200 Subject: [Ferret-talk] aaf and dynamic attrs: a bug? Message-ID: <7347f6ac6f3808df0e986006099aa340@ruby-forum.com> Hi! I faced some issue while using it for dynamic attrs indexing/search. Maybe I made something wrong. Here is test method. Everything works just fine until last line http://pastie.caboo.se/66274 . Tested on both stable and trunk of aaf and ferret 0.11.4. the short version of code below: Contact.acts_as_ferret :fields => [ :first_name ] assert Contact.find(:first).respond_to?(:first_name_to_ferret) assert_equal 1, Contact.find_by_contents('Y*').total_hits assert_equal 1, Contact.find_by_contents('first_name:Y*').total_hits Contact.aaf_index.close FileUtils.rm_rf 'index' Contact.acts_as_ferret :fields => [ :first_name, :last_name ] assert Contact.find(:first).respond_to?(:last_name_to_ferret) assert_equal 1, Contact.find_by_contents('Y*').total_hits assert_equal 1, Contact.find_by_contents('first_name:Y*').total_hits assert_equal 1, Contact.find_by_contents('last_name:K*').total_hits assert_equal 1, Contact.find_by_contents('K*').total_hits # assertion fails here: get 0 instead of 1 !! Maybe there is any better solution to add new fields to existing index? Best regards, Yury Kotlyarov -- Posted via http://www.ruby-forum.com/. From dagnan at gmail.com Wed May 30 19:47:58 2007 From: dagnan at gmail.com (Dagnan) Date: Thu, 31 May 2007 01:47:58 +0200 Subject: [Ferret-talk] Problem compiling dev version for windows Message-ID: Hi! I'm trying to compile the svn version (0.10.5.1), but I'm experimenting a lot of troubles (I had to install Visual C++ Express 2005, etc.) - Windows XP SP2 Has anyone already done it? It would be very nice if you coul help me. Actually, I need it to solve the highlight bug (I also get the segfault message when I use the highlight function). Thanks! -- Posted via http://www.ruby-forum.com/. From dagnan at gmail.com Wed May 30 19:53:04 2007 From: dagnan at gmail.com (Michel Dagnan) Date: Thu, 31 May 2007 01:53:04 +0200 Subject: [Ferret-talk] Problem compiling dev version for windows In-Reply-To: References: Message-ID: <9f2ca28df46314c0241dc9af5b43fc2a@ruby-forum.com> I want to precise something: the problem is during the nmake process (nmake is okay, but not the thing the compilator does with cl.exe) Microsoft (R) Program Maintenance Utility Version 8.00.50727.42 Copyright (C) Microsoft Corporation. All rights reserved. cl -nologo -I. -IE:/ruby/lib/ruby/1.8/i386-mswin32 -IE:/ruby/lib/ruby/1. 8/i386-mswin32 -I. -MD -Zi -O2b2xg- -G6 -c -Tcanalysis.c cl : Command line warning D9035 : option 'Og-' has been deprecated and will be r emoved in a future release cl : Command line warning D9002 : ignoring unknown option '-G6' analysis.c analysis.c : fatal error C1902: Program database manager mismatch; please check your installation NMAKE : fatal error U1077: '"C:\Program Files\Microsoft Visual Studio 8\VC\bin\c l.EXE"' : return code '0x2' Stop. -- Posted via http://www.ruby-forum.com/. From john at squirl.info Wed May 30 21:10:11 2007 From: john at squirl.info (John Mcgrath) Date: Thu, 31 May 2007 03:10:11 +0200 Subject: [Ferret-talk] iterate through an entire index Message-ID: I'm trying to get all the documents in an index. I've been hunting around, but I don't see a clear way to do this. I can get docs by searching on a term, or by specific doc id, but having trouble getting the whole pile of them. I'm using AAF and Ferret 0.11.4. Any help appreciated. John -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Thu May 31 03:57:01 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Thu, 31 May 2007 09:57:01 +0200 Subject: [Ferret-talk] iterate through an entire index In-Reply-To: References: Message-ID: <5A48FD2E-0857-400C-A6B0-4A14C85315B2@benjaminkrause.com> On May 31, 2007, at 03:10, John Mcgrath wrote: > I'm trying to get all the documents in an index. I've been hunting > around, but I don't see a clear way to do this. I can get docs by > searching on a term, or by specific doc id, but having trouble getting > the whole pile of them. I'm using AAF and Ferret 0.11.4. Any help > appreciated. Ferret 0.11.4 introduced a ferret-browser (try ferret- browser on the shell) the code for the ferret browser is part of the gem and is entirely in ruby .. you should find an example on how to iterate through all documents in that code .. if you cant find it, i can take a look for you .. Ben From joergd at pobox.com Thu May 31 10:13:53 2007 From: joergd at pobox.com (Joerg Diekmann) Date: Thu, 31 May 2007 16:13:53 +0200 Subject: [Ferret-talk] script/runner permission denied Message-ID: <8f5a6961fc47c470d7944f77b9c7cc99@ruby-forum.com> Hi, I have been running ferret for several months now, but want to make use of the new Drb functionality. So I am running Rails Stable (off the branch), Ferret 0.11.4, and the latest AAF. I have a ferret_server_yml file in my RAILS_ROOT/config folder: production: host: localhost port: 9009 pid_file: log/ferret.pid And I took the three scripts from the AAF plugin and put them into my RAILS_ROOT/scripts folder. I try to start the server with RAILS_ENV=production ruby script/ferret_start And get the following error: env: script/runner: Permission denied I have tried starting it using sudo (just for kicks) and didn't make a difference. I've searched everywhere, but am completely stumped. Has anybody come across a problem like this? Joerg P.S. I'm running the latest OSX on Intel. -- Posted via http://www.ruby-forum.com/. From yura at brainhouse.ru Thu May 31 13:05:02 2007 From: yura at brainhouse.ru (Yury Kotlyarov) Date: Thu, 31 May 2007 21:05:02 +0400 Subject: [Ferret-talk] aaf and dynamic attrs: a bug? Message-ID: <465F003E.3070406@brainhouse.ru> Hi! I faced some issue while using it for dynamic attrs indexing/search. Maybe I made something wrong. Here is test method. Everything works just fine until last line http://pastie.caboo.se/66274 . Tested on both stable and trunk of aaf and ferret 0.11.4. the short version of code below: Contact.acts_as_ferret :fields => [ :first_name ] assert Contact.find(:first).respond_to?(:first_name_to_ferret) assert_equal 1, Contact.find_by_contents('Y*').total_hits assert_equal 1, Contact.find_by_contents('first_name:Y*').total_hits Contact.aaf_index.close FileUtils.rm_rf 'index' Contact.acts_as_ferret :fields => [ :first_name, :last_name ] assert Contact.find(:first).respond_to?(:last_name_to_ferret) assert_equal 1, Contact.find_by_contents('Y*').total_hits assert_equal 1, Contact.find_by_contents('first_name:Y*').total_hits assert_equal 1, Contact.find_by_contents('last_name:K*').total_hits assert_equal 1, Contact.find_by_contents('K*').total_hits # assertion fails here: get 0 instead of 1 !! Maybe there is any better solution to add new fields to existing index? -- Best regards, Yury Kotlyarov <;-) BrainHouse web: http://www.brainhouse.ru email: yura at brainhouse.ru From john at digitalpulp.com Thu May 31 14:30:37 2007 From: john at digitalpulp.com (John Bachir) Date: Thu, 31 May 2007 14:30:37 -0400 Subject: [Ferret-talk] complete index rebuild using AAF trunk Message-ID: <88EF8DAF-8AA3-4DD8-961D-BAD6D091852E@digitalpulp.com> I am using AAF trunk, and I want a way to rebuild an index on a production site with little or no interruption to service. The Drb Server documentation* states that when an index is rebuilt, it is done in a separate location and then swapped into place when finished, and so to do a complete rebuild on a live site, one must take into consideration objects which have been created or modified in the meantime. To achieve this, I have come up with the following solution: http://pastie.textmate.org/66602 [1] Does this look like a complete solution? I suppose it relies on timestamp consistency between system components... it is possible that between setting "start = ..." and performing the rebuild, another thread in the system will have create an earlier timestamp for an object that did not get committed until after the rebuild began. Is it possible to do a perfect rebuild, or would that require building a layer of concurrency logic into AAF? [2] Is the behavior described in the Drb Server documentation different from AAF when not using the Drb Server? Thanks, John * http://projects.jkraemer.net/acts_as_ferret/wiki/DrbServer#AAFtrunk From kyle at casttv.com Thu May 31 16:12:40 2007 From: kyle at casttv.com (Kyle Maxwell) Date: Thu, 31 May 2007 13:12:40 -0700 Subject: [Ferret-talk] complete index rebuild using AAF trunk In-Reply-To: <88EF8DAF-8AA3-4DD8-961D-BAD6D091852E@digitalpulp.com> References: <88EF8DAF-8AA3-4DD8-961D-BAD6D091852E@digitalpulp.com> Message-ID: <47699a8d0705311312u29c03c6egbaa8902fd3e18e30@mail.gmail.com> > [1] Does this look like a complete solution? I suppose it relies on > timestamp consistency between system components... it is possible > that between setting "start = ..." and performing the rebuild, > another thread in the system will have create an earlier timestamp > for an object that did not get committed until after the rebuild > began. Is it possible to do a perfect rebuild, or would that require > building a layer of concurrency logic into AAF? You can sync your server clocks using ntpd, and you can always update a few extra seconds to work around latency. -Kyle From yura at brainhouse.ru Thu May 31 16:37:00 2007 From: yura at brainhouse.ru (Yury Kotlyarov) Date: Fri, 01 Jun 2007 00:37:00 +0400 Subject: [Ferret-talk] aaf and dynamic attrs: a bug? In-Reply-To: <465F003E.3070406@brainhouse.ru> References: <465F003E.3070406@brainhouse.ru> Message-ID: <465F31EC.5060609@brainhouse.ru> send all code we have so far and full test case code. --- model: contact.rb class Contact < ActiveRecord::Base end --- migration: 001_create_contacts_table.rb class CreateContacts < ActiveRecord::Migration def self.up create_table :contacts do |t| t.column :first_name, :string t.column :last_name, :string end end def self.down drop_table :contacts end end --- fixture: contacts.yml renat: id: 1 first_name: Renat last_name: Akhmerov yura: id: 2 first_name: Yury last_name: Kotlyarov --- test: contact_test.rb require File.dirname(__FILE__) + '/../test_helper' require 'fileutils' class ContactTest < Test::Unit::TestCase fixtures :contacts def setup if File.exists?('index') FileUtils.rm_rf('index') end end def test_new_field Contact.acts_as_ferret :fields => [ :first_name ] assert_equal 1, Contact.find_by_contents('first_name:Y*').total_hits assert_equal 1, Contact.find_by_contents('Y*').total_hits Contact.aaf_index.close FileUtils.rm_rf('index') Contact.acts_as_ferret :fields => [ :first_name, :last_name ] assert_equal 1, Contact.find_by_contents('last_name:K*').total_hits # it fails on the following line!! a bug here? assert_equal 1, Contact.find_by_contents('K*').total_hits end end From yura at brainhouse.ru Thu May 31 17:19:48 2007 From: yura at brainhouse.ru (Yury Kotlyarov) Date: Fri, 01 Jun 2007 01:19:48 +0400 Subject: [Ferret-talk] aaf and dynamic attrs: a bug? In-Reply-To: <465F31EC.5060609@brainhouse.ru> References: <465F003E.3070406@brainhouse.ru> <465F31EC.5060609@brainhouse.ru> Message-ID: <465F3BF4.6060406@brainhouse.ru> direct access to the index works - thanks to Thomas Nichols for the idea --- test: contact_test.rb require File.dirname(__FILE__) + '/../test_helper' require 'fileutils' require 'ferret' class ContactTest < Test::Unit::TestCase fixtures :contacts def setup if File.exists?('index') FileUtils.rm_rf('index') end end def test_new_field Contact.acts_as_ferret :fields => [ :first_name ] assert_equal 1, Contact.find_by_contents('first_name:Y*').total_hits assert_equal 1, Contact.find_by_contents('Y*').total_hits Contact.aaf_index.close FileUtils.rm_rf('index') Contact.acts_as_ferret :fields => [ :first_name, :last_name ] assert_equal 1, Contact.find_by_contents('last_name:K*').total_hits # it fails on the following line!! a bug here? #assert_equal 1, Contact.find_by_contents('K*').total_hits idx = Ferret::Index::Index.new(ath => 'index/test/contact', :create_if_missing => false) assert_equal 1, idx.search('last_name:K*').total_hits assert_equal 1, idx.search('K*').total_hits end end Yury Kotlyarov wrote: > send all code we have so far and full test case code. > > --- model: contact.rb > class Contact < ActiveRecord::Base > end > > --- migration: 001_create_contacts_table.rb > class CreateContacts < ActiveRecord::Migration > def self.up > create_table :contacts do |t| > t.column :first_name, :string > t.column :last_name, :string > end > end > > def self.down > drop_table :contacts > end > end > > --- fixture: contacts.yml > renat: > id: 1 > first_name: Renat > last_name: Akhmerov > yura: > id: 2 > first_name: Yury > last_name: Kotlyarov > > --- test: contact_test.rb > > require File.dirname(__FILE__) + '/../test_helper' > require 'fileutils' > > class ContactTest < Test::Unit::TestCase > fixtures :contacts > > def setup > if File.exists?('index') > FileUtils.rm_rf('index') > end > end > > def test_new_field > Contact.acts_as_ferret :fields => [ :first_name ] > assert_equal 1, Contact.find_by_contents('first_name:Y*').total_hits > assert_equal 1, Contact.find_by_contents('Y*').total_hits > > Contact.aaf_index.close > FileUtils.rm_rf('index') > > Contact.acts_as_ferret :fields => [ :first_name, :last_name ] > > assert_equal 1, Contact.find_by_contents('last_name:K*').total_hits > # it fails on the following line!! a bug here? > assert_equal 1, Contact.find_by_contents('K*').total_hits > end > end > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > > -- Best regards, Yury Kotlyarov <;-) BrainHouse web: http://www.brainhouse.ru email: yura at brainhouse.ru skype: yura__115 phone: +7 905 758 1491 jabber: yury.kotlyarov at jabber.ru From john at johnleach.co.uk Thu May 31 17:44:26 2007 From: john at johnleach.co.uk (John Leach) Date: Thu, 31 May 2007 22:44:26 +0100 Subject: [Ferret-talk] Ferret.donate(Money.aus_dollar(200)) Message-ID: <1180647866.18832.48.camel@localhost.localdomain> Remember folks, we can support the Ferret project by donating warm soft electronic cash to the author, Dave Balmain, using the paypal buttons on the website: http://ferret.davebalmain.com/trac http://ferret.davebalmain.com/trac/wiki/DonationsFAQ We can also buy the Ferret Shortcut pdf/book from O'Reilly, also written by Dave Balmain. It's awesome good: http://www.oreilly.com/catalog/9780596527853/index.html Many of us probably use Ferret via the acts_as_ferret Rails plugin by Jens Kraemer. He doesn't stipulate how he'd like to be supported, so until he chooses to clarify otherwise, I'd recommend that if you see him in the street buy him lunch[1]. He looks like this: http://www.xing.com/profile/Jens_Kraemer2 Keep a close eye out for him. Happy Ferreting, John[2]. [1] Check his dietary requirements first. [2] Random disinterested third party - not involved in the consumption of donations. -- http://johnleach.co.uk From yura at brainhouse.ru Thu May 31 18:00:14 2007 From: yura at brainhouse.ru (Yury Kotlyarov) Date: Fri, 01 Jun 2007 02:00:14 +0400 Subject: [Ferret-talk] aaf and dynamic attrs: a bug? WORKAROUND In-Reply-To: <465F3BF4.6060406@brainhouse.ru> References: <465F003E.3070406@brainhouse.ru> <465F31EC.5060609@brainhouse.ru> <465F3BF4.6060406@brainhouse.ru> Message-ID: <465F456E.1070606@brainhouse.ru> Got a workaround! Adding following line solves the problem: Contact.aaf_index.ferret_index.options[:default_field] << 'last_name' So is it a bug in aaf? --- here is full testrequire File.dirname(__FILE__) + '/../test_helper' require 'fileutils' class ContactTest < Test::Unit::TestCase fixtures :contacts def setup if File.exists?('index') FileUtils.rm_rf('index') end end def test_new_field Contact.acts_as_ferret :fields => [ :first_name ] assert_equal 1, Contact.find_by_contents('first_name:Y*').total_hits assert_equal 1, Contact.find_by_contents('Y*').total_hits Contact.acts_as_ferret :fields => [ :first_name, :last_name ] Contact.aaf_index.close FileUtils.rm_rf('index') Contact.aaf_index.ferret_index.options[:default_field] << 'last_name' assert_equal 1, Contact.find_by_contents('K*').total_hits end end Yury Kotlyarov wrote: > direct access to the index works - thanks to Thomas Nichols for the idea > > --- test: contact_test.rb > require File.dirname(__FILE__) + '/../test_helper' > require 'fileutils' > require 'ferret' > > class ContactTest < Test::Unit::TestCase > fixtures :contacts > > def setup > if File.exists?('index') > FileUtils.rm_rf('index') > end > end > > def test_new_field > Contact.acts_as_ferret :fields => [ :first_name ] > assert_equal 1, Contact.find_by_contents('first_name:Y*').total_hits > assert_equal 1, Contact.find_by_contents('Y*').total_hits > > Contact.aaf_index.close > FileUtils.rm_rf('index') > > Contact.acts_as_ferret :fields => [ :first_name, :last_name ] > > assert_equal 1, Contact.find_by_contents('last_name:K*').total_hits > > # it fails on the following line!! a bug here? > #assert_equal 1, Contact.find_by_contents('K*').total_hits > > idx = Ferret::Index::Index.new(ath => 'index/test/contact', > :create_if_missing => false) > assert_equal 1, idx.search('last_name:K*').total_hits > assert_equal 1, idx.search('K*').total_hits > end > end > > > Yury Kotlyarov wrote: > >> send all code we have so far and full test case code. >> >> --- model: contact.rb >> class Contact < ActiveRecord::Base >> end >> >> --- migration: 001_create_contacts_table.rb >> class CreateContacts < ActiveRecord::Migration >> def self.up >> create_table :contacts do |t| >> t.column :first_name, :string >> t.column :last_name, :string >> end >> end >> >> def self.down >> drop_table :contacts >> end >> end >> >> --- fixture: contacts.yml >> renat: >> id: 1 >> first_name: Renat >> last_name: Akhmerov >> yura: >> id: 2 >> first_name: Yury >> last_name: Kotlyarov >> >> --- test: contact_test.rb >> >> require File.dirname(__FILE__) + '/../test_helper' >> require 'fileutils' >> >> class ContactTest < Test::Unit::TestCase >> fixtures :contacts >> >> def setup >> if File.exists?('index') >> FileUtils.rm_rf('index') >> end >> end >> >> def test_new_field >> Contact.acts_as_ferret :fields => [ :first_name ] >> assert_equal 1, Contact.find_by_contents('first_name:Y*').total_hits >> assert_equal 1, Contact.find_by_contents('Y*').total_hits >> >> Contact.aaf_index.close >> FileUtils.rm_rf('index') >> >> Contact.acts_as_ferret :fields => [ :first_name, :last_name ] >> >> assert_equal 1, Contact.find_by_contents('last_name:K*').total_hits >> # it fails on the following line!! a bug here? >> assert_equal 1, Contact.find_by_contents('K*').total_hits >> end >> end >> >> >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk >> >> >> > > > -- Best regards, Yury Kotlyarov <;-) BrainHouse web: http://www.brainhouse.ru email: yura at brainhouse.ru skype: yura__115 phone: +7 905 758 1491 jabber: yury.kotlyarov at jabber.ru