From kraemer at webit.de Thu Feb 1 03:30:09 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 1 Feb 2007 09:30:09 +0100 Subject: [Ferret-talk] Automatically Indexing Associated Models In-Reply-To: References: Message-ID: <20070201083009.GH21355@cordoba.webit.de> On Thu, Feb 01, 2007 at 02:42:38AM +0100, Mark wrote: > PROBLEM > I have two models, Blog and BlogComment. When a blog is initially > created, it has no comments. Upon creation, the title and body are > automatically added to the ferret index and directly searchable. > However, when a comment is added to a blog, that comment does not get > added to the index and is therefore not ferretable. The desired behavior > is that when a comment is added to a blog, that the comment be > ferretable. > > CURRENT SETUP > Blog (id, title, body, user_id) > BlogComment (id, blog_id, comment) > > class Blog < ActiveRecord::Base > has_many :blog_comments, :dependent => :destroy > acts_as_ferret :additional_fields => [:blog_comments] > > def blog_comments > self.blog_comments.collect {|comment| comment.body } > end > > end > > class BlogComment < ActiveRecord::Base > belongs_to :blog > end > > CONTROLLER > [..] > if @blog.blog_comments << comment > do_something adding @blog.ferret_update here should do the trick. > else > do_something > end > Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dan_at_works at yahoo.com Thu Feb 1 07:57:31 2007 From: dan_at_works at yahoo.com (Ngoc Ngoc) Date: Thu, 1 Feb 2007 13:57:31 +0100 Subject: [Ferret-talk] Searcher do not work or I do not work Message-ID: Hi. I want to learn more about ferret. So I downloaded ferret-0.10.14 and write a simple test script Only query = TermQuery.new(:content, 'program') gives result. If I change 'program' with 'Good' or 'Extra' -> no result and searching on (:title, 'Ruby') -> no result Strange, Strange Here is the script ------- require 'rubygems' require 'ferret' include Ferret include Ferret::Search include Ferret::Index index = Index.new(:path => './index') index << {:location => 'here', :title => 'Programming Ruby', :content => 'Good excellent program' } index << {:location => 'local', :title => 'Programming Rubyist', :content => 'Extra ordinary program' } index.close() searcher = Searcher.new('./index') query = TermQuery.new(:content, 'Good') searcher.search_each(query) do |id, score| doc = searcher[id] puts "Document #{id} found with a score of #{score}" puts doc[:content] end -- Posted via http://www.ruby-forum.com/. From andreas.korth at gmx.net Thu Feb 1 08:26:14 2007 From: andreas.korth at gmx.net (Andreas Korth) Date: Thu, 1 Feb 2007 14:26:14 +0100 Subject: [Ferret-talk] Searcher do not work or I do not work In-Reply-To: References: Message-ID: On 01.02.2007, at 13:57, Ngoc Ngoc wrote: > I want to learn more about ferret. So I downloaded ferret-0.10.14 and > write a simple test script > > Only query = TermQuery.new(:content, 'program') gives result. > If I change 'program' with 'Good' or 'Extra' -> no result > and searching on (:title, 'Ruby') -> no result > > > query = TermQuery.new(:content, 'Good') > > searcher.search_each(query) do |id, score| > doc = searcher[id] > puts "Document #{id} found with a score of #{score}" > puts doc[:content] > end Try 'good' in lowercase and it'll work. The reason is that your index converts each word in your document to lowercase. This is due to the default analyzer used by the index which happens to be Ferret::Analysis::StandardAnalyzer (see the rdocs for details). Because of the way you build your query, this lowercase conversion is not applied to your query string, hence no match. The trick is, that your queries need to go through the same analyzer that is used for indexing in order to get the desired results. Let me suggest a better way to write this script: require 'rubygems' require 'ferret' include Ferret::Index index = Index.new index << { :location => 'here', :title => 'Programming Ruby', :content => 'Good excellent program' } index << { :location => 'local', :title => 'Programming Rubyist', :content => 'Extra ordinary program' } docs = index.search('Good') docs.hits.each do |hit| puts hit.inspect puts "Document #{hit.doc} found with a score of #{hit.score}" puts index[hit.doc][:content] end As you see, you don't need to employ a Searcher, nor do you have to build a TermQuery explicitely. Just call #search on your Index and everything works as expected (plus you get all the nice features of Ferret's query parser). Since we have used the Index#search method, the search automatically uses the same analyzer for parsing your query that was used for indexing the document. Cheers, Andy From blah at blah.com Thu Feb 1 08:54:41 2007 From: blah at blah.com (Mark) Date: Thu, 1 Feb 2007 14:54:41 +0100 Subject: [Ferret-talk] Automatically Indexing Associated Models In-Reply-To: <20070201083009.GH21355@cordoba.webit.de> References: <20070201083009.GH21355@cordoba.webit.de> Message-ID: <3d0d9a26211961a3e92e9b7d8dad6446@ruby-forum.com> Jens Kraemer wrote: >> if @blog.blog_comments << comment >> do_something > > adding > @blog.ferret_update > here should do the trick. > >> else >> do_something >> end Jens, that's one option I had considered. What do you think about creating an onsave event in all the models that use acts_as_ferret like so.. after_save :update_ferret_index def update_ferret_index self.blog.ferret_update if self.blog end -- Posted via http://www.ruby-forum.com/. From dan_at_works at yahoo.com Thu Feb 1 09:19:08 2007 From: dan_at_works at yahoo.com (ngoc) Date: Thu, 1 Feb 2007 15:19:08 +0100 Subject: [Ferret-talk] Searcher do not work or I do not work In-Reply-To: References: Message-ID: Thanks Andy Your code is efficient > > docs = index.search('Good') > docs.hits.each do |hit| > puts hit.inspect > puts "Document #{hit.doc} found with a score of #{hit.score}" > puts index[hit.doc][:content] > end > -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Feb 1 09:30:45 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 1 Feb 2007 15:30:45 +0100 Subject: [Ferret-talk] Automatically Indexing Associated Models In-Reply-To: <3d0d9a26211961a3e92e9b7d8dad6446@ruby-forum.com> References: <20070201083009.GH21355@cordoba.webit.de> <3d0d9a26211961a3e92e9b7d8dad6446@ruby-forum.com> Message-ID: <20070201143045.GJ21355@cordoba.webit.de> On Thu, Feb 01, 2007 at 02:54:41PM +0100, Mark wrote: > Jens Kraemer wrote: > > >> if @blog.blog_comments << comment > >> do_something > > > > adding > > @blog.ferret_update > > here should do the trick. > > > >> else > >> do_something > >> end > > Jens, that's one option I had considered. What do you think about > creating an onsave event in all the models that use acts_as_ferret like > so.. > > after_save :update_ferret_index > > def update_ferret_index > self.blog.ferret_update if self.blog > end yeah, this should work, too. Even prettier from an architectural point of view :-) Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From tennisbum2002 at hotmail.com Fri Feb 2 02:17:05 2007 From: tennisbum2002 at hotmail.com (Aryk Grosz) Date: Fri, 2 Feb 2007 08:17:05 +0100 Subject: [Ferret-talk] Getting "ArgumentError ( isn't a valid directory argume In-Reply-To: <002a6391f3b20ac93d068e6ce6f998c0@ruby-forum.com> References: <3986844f3d800df30733372554d993a4@ruby-forum.com> <28b1b191392270b0d5545555885147df@ruby-forum.com> <20061217220707.GA21628@cordoba.webit.de> <860f8413835b75e43b8b0c866b78ad04@ruby-forum.com> <002a6391f3b20ac93d068e6ce6f998c0@ruby-forum.com> Message-ID: <6b7d087a8742b63941fc099bbc5b3d4b@ruby-forum.com> So will the 0.10.11 version of this plugin fix this problem? Im getting the error sporadically after the index was already built. I cant seem to figure out whats causing it. -- Posted via http://www.ruby-forum.com/. From peter at ioffer.com Fri Feb 2 12:42:53 2007 From: peter at ioffer.com (peter) Date: Fri, 02 Feb 2007 09:42:53 -0800 Subject: [Ferret-talk] Error: uninitialized constant LockError Message-ID: Hey Guys. So, I get this error every so often, and in sneaking at the code, it seems that it could be an easy fix, where it's looking for a LockError, but I think it's a LockException. I'm on 0.10.13 with a very large index (about a million docs) that's staged on several servers, which are updated daily. (Using rails 1.1.6) Thanks for all the hard work. Here's the stack trace: /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependenc ies.rb:123:in `const_missing' /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependenc ies.rb:133:in `const_missing' /usr/lib/ruby/gems/1.8/gems/ferret-0.10.13/lib/ferret/index.rb:674:in `ensure_reader_open' /usr/lib/ruby/gems/1.8/gems/ferret-0.10.13/lib/ferret/index.rb:383:in `[]' /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' /usr/lib/ruby/gems/1.8/gems/ferret-0.10.13/lib/ferret/index.rb:382:in `[]' /var/ruby/ioffer.com/app/current/app/models/item.rb:878:in `convert_search_hits_to_items' /var/ruby/ioffer.com/app/current/app/models/item.rb:877:in `convert_search_hits_to_items' /var/ruby/ioffer.com/app/current/app/controllers/search_controller.rb:221:in `list' /usr/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/base.rb: 941:in `perform_action_without_filters' /usr/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/filters. rb:368:in `perform_action_without_benchmark' /usr/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/benchmar king.rb:69:in `perform_action_without_rescue' /usr/lib/ruby/1.8/benchmark.rb:293:in `measure' /usr/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/benchmar king.rb:69:in `perform_action_without_rescue' /usr/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/rescue.r b:82:in `perform_action' /usr/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/base.rb: 408:in `process_without_filters' /usr/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/filters. rb:377:in `process_without_session_management_support' /usr/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/session_ management.rb:117:in `process' /usr/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/dispatcher.rb:38:in `dispatch' /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.20/lib/mongrel/rails.rb:78:in `process' /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.20/lib/mongrel/rails.rb:76:in `process' /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.20/lib/mongrel.rb:618:in `process_client' /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.20/lib/mongrel.rb:617:in `process_client' /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.20/lib/mongrel.rb:736:in `run' /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.20/lib/mongrel.rb:736:in `run' /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.20/lib/mongrel.rb:720:in `run' /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.20/lib/mongrel/configurator.rb:271:i n `run' /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.20/lib/mongrel/configurator.rb:270:i n `run' /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.20/bin/mongrel_rails:127:in `run' /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.20/lib/mongrel/command.rb:211:in `run' /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.20/bin/mongrel_rails:243 /usr/bin/mongrel_rails:18 From patched at sourfamily.com Fri Feb 2 19:00:49 2007 From: patched at sourfamily.com (Gregg Pollack) Date: Sat, 3 Feb 2007 01:00:49 +0100 Subject: [Ferret-talk] Boost Sorting with Acts_as_ferret? Message-ID: <548044c545fd400a1befe95a86d8e089@ruby-forum.com> Hey guys, Simple question here. I have a single index of recipes, from which I'm looking at the following fields: Name, Ingredient Text, Tags, and Description. The key is, I want to show all the results that come from Name, before I show any of the results from Ingredient Text, Tags, or Description. I tried doing this: acts_as_ferret :fields => { :name => {:boost => 9000, :store => :yes}, :ingredients => {:boost => 6000, :store => :yes}, :tags => {:boost => 3000, :store => :yes}, :description => {:boost => 1, :store => :yes}} I figured if I put huge boost on my fields, the "name" results would always come before ingredients (no matter what the score). Can anyone throw me any ideas on how one might do this? Here is an explain of one of my first results. Perhaps I need to change the rounding of my score? I dunno. Basque chicken scored 1.0 3291.866 = product of: 6583.732 = sum of: 674.6201 = weight(ingredients_without_brackets:chicken in 12670), product of: 0.2805449 = query_weight(ingredients_without_brackets:chicken), product of: 3.13109 = idf(doc_freq=2145) 0.08959977 = query_norm 2404.677 = field_weight(ingredients_without_brackets:chicken in 12670), product of: 1.0 = tf(term_freq(ingredients_without_brackets:chicken)=1) 3.13109 = idf(doc_freq=2145) 768.0 = field_norm(field=ingredients_without_brackets, doc=12670) 5909.112 = weight(tags_with_spaces:chicken in 12670), product of: 0.4547729 = query_weight(tags_with_spaces:chicken), product of: 5.075603 = idf(doc_freq=306) 0.08959977 = query_norm 12993.54 = field_weight(tags_with_spaces:chicken in 12670), product of: 1.0 = tf(term_freq(tags_with_spaces:chicken)=1) 5.075603 = idf(doc_freq=306) 2560.0 = field_norm(field=tags_with_spaces, doc=12670) 0.5 = coord(2/4) Thanks in advance, -Gregg -- Posted via http://www.ruby-forum.com/. From www-data at server.andreas-s.net Sat Feb 3 21:46:26 2007 From: www-data at server.andreas-s.net (www-data) Date: Sun, 4 Feb 2007 03:46:26 +0100 Subject: [Ferret-talk] Just a thought In-Reply-To: <20070131085302.GE21355@cordoba.webit.de> References: <1169681985.11386.32.camel@localhost.localdomain> <20070126132601.GB26810@cordoba.webit.de> <20070131085302.GE21355@cordoba.webit.de> Message-ID: It doesn't seem to be common knowledge yet but are you running the engines plugin? If so it needs to be updated to use rails 1.2+. And, if you're using login_engine and/or user_engine they will need to be removed. In other words, you can't use login_engine or user_engine and rails 1.2+. And the engined plugin is what caused similar errors for me. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Sun Feb 4 12:23:22 2007 From: kraemer at webit.de (Jens Kraemer) Date: Sun, 4 Feb 2007 18:23:22 +0100 Subject: [Ferret-talk] Boost Sorting with Acts_as_ferret? In-Reply-To: <548044c545fd400a1befe95a86d8e089@ruby-forum.com> References: <548044c545fd400a1befe95a86d8e089@ruby-forum.com> Message-ID: <20070204172322.GA29012@cordoba.webit.de> Hi! On Sat, Feb 03, 2007 at 01:00:49AM +0100, Gregg Pollack wrote: > Hey guys, > > Simple question here. > > I have a single index of recipes, from which I'm looking at the > following fields: Name, Ingredient Text, Tags, and Description. > > The key is, I want to show all the results that come from Name, > before I show any of the results from Ingredient Text, Tags, or > Description. > > I tried doing this: > > acts_as_ferret :fields => { > :name => {:boost => 9000, :store => :yes}, > :ingredients => {:boost => 6000, :store => :yes}, > :tags => {:boost => 3000, :store => :yes}, > :description => {:boost => 1, :store => :yes}} > > I figured if I put huge boost on my fields, the "name" results would > always come before ingredients (no matter what the score). souunds reasonable. > Can anyone throw me any ideas on how one might do this? Here is an > explain of one of my first results. Perhaps I need to change the > rounding of my score? I dunno. Scores are always between 0 and 1. You might want to set some less aggressive values for your boost values, it's more the relation between them that counts, than their absolute value. However that way you won't have a guarantee that the sorting will be as you intend. If this really is that important, run an explicit search only against the name field, and then one against all fields, excluding the hits you got with the first search: # get results where name fields match (might be good to escape at least # '(' and ')' in query): name_results = Recipe.find_by_contents(%{name:(#{query})}) # now get results where other fields match, but exclude those we already # have: ids = name_results.map(&:id).join ' OR ' other_results = Recipe.find_by_contents(%{#{query} -id:(#{ids})"}) Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From patched at sourfamily.com Sun Feb 4 14:10:21 2007 From: patched at sourfamily.com (Gregg Pollack) Date: Sun, 4 Feb 2007 20:10:21 +0100 Subject: [Ferret-talk] Boost Sorting with Acts_as_ferret? In-Reply-To: <20070204172322.GA29012@cordoba.webit.de> References: <548044c545fd400a1befe95a86d8e089@ruby-forum.com> <20070204172322.GA29012@cordoba.webit.de> Message-ID: Jens, Thanks for the suggestion. However, in this case the #ids field could contain a thousand ids, which I'm not sure is good. However, you just gave me another idea that involves 4 queries, but it doesn't seem to be working as it should. Can you see why? q = "chicken" # lets get the total (where it's found in all the fields) total = index.search_each(q) do |doc, score| end puts "Total should be = #{total}" # find count of just name results query = qp.parse("+name:(#{q})") puts query.to_s total = index.search_each(query) do |doc, score| end puts "total with name = #{total}" # find all results where query is in ingredients but excluding the name results query = qp.parse("+ingredients:(#{q}) -name:(#{q})") puts query.to_s total = index.search_each(query) do |doc, score| end puts "total with ingredients = #{total}" query = qp.parse("+tags:(#{q}) -name:(#{q}) -ingredients:(#{q})") puts query.to_s total = index.search_each(query) do |doc, score| end puts "total with tags = #{total}" query = qp.parse("+description:(#{q}) -name:(#{q}) -ingredients:(#{q}) -tags:(#{q})") puts query.to_s total = index.search_each(query) do |doc, score| end puts "total with desc = #{total}" You would think that these fields would add up, but here are the results: Total should be = 2225 name:chicken total with name = 1028 +ingredients:chicken -name:chicken total with ingredients = 2115 +tags:chicken -name:chicken -ingredients:chicken total with tags = 18 +description:chicken -name:chicken -ingredients:chicken -tags:chicken total with desc = 1 As you can see, the numbers aren't adding up, but shouldn't they be? Thanks in advance for your help. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Sun Feb 4 14:34:55 2007 From: kraemer at webit.de (Jens Kraemer) Date: Sun, 4 Feb 2007 20:34:55 +0100 Subject: [Ferret-talk] [AAF] remote indexing via DRb with acts_as_ferret Message-ID: <20070204193455.GB29012@cordoba.webit.de> Hi! Aaf trunk has undergone several major refactorings the last days, with the result that you can now transparently switch your app from local to remote indexing and back :-) If you plan to scale your app to more than one physical machine, or if you have problems with corrupted indexes and the like under high load, you really should give this a try. I wrote some documentation to get you started with the remote indexing stuff at http://projects.jkraemer.net/acts_as_ferret/wiki/DrbServer Looking forward to your feedback, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From chad at zulu.net Mon Feb 5 04:05:17 2007 From: chad at zulu.net (Chad Thatcher) Date: Mon, 5 Feb 2007 10:05:17 +0100 Subject: [Ferret-talk] Any word on a recent build for Win32/Rails 1.2.1 Message-ID: Hi, I am developing on Win XP at the moment for an NGO that seems stuck on the idea that Win2003 Server is the future. Much as I have tried to convert them to a unix environment, they're having none of it. I am using the prebuilt Win32 release of ferret 0.10.9 and would like to upgrade to the latest version for the latest features etc but more importantly to solve an issue with bug reporting in Rail 1.2.1 which seems to be braking when ferret is required and included (well this is at least the case when using ferret from /script/runner). When I require and include ferret any bugs (even those unrelated to ferret) fail to be reported correctly with messages that start: ...active_support/dependencies.rb:423:in `remove_const': cannot remove Object:: (NameError) Where are things like QueryParser, Field etc. I have tried compiling ferret myself but I only have mingw installed and the ferret rails task seems to want to use nmake (MVC?). Thanks, Chad. -- Posted via http://www.ruby-forum.com/. From neeraj.jsr at gmail.com Mon Feb 5 08:26:50 2007 From: neeraj.jsr at gmail.com (Raj Singh) Date: Mon, 5 Feb 2007 14:26:50 +0100 Subject: [Ferret-talk] rebuild_index is returning {} Message-ID: <1dce587bf8965eb71032756013c7dd38@ruby-forum.com> Previously when I used to build index i used to get false in return. >> Event.rebuild_index => false Now I get this. >> Event.rebuild_index => {} Following changes took place. 1) I moved my app from FCGI to mongrel. 2) I moved my app to capistrano. Before moving to capistrano the code was acts_as_ferret :fields => [ "name", "desc_uf" ] Now the code is acts_as_ferret :fields => [ "name", "desc_uf" ],:index_dir => "/home/dorelal/apps/eii_#{RAILS_ENV}/shared/ferret" My question is this: Now that I do Event.rebuild_index I get {} in return. Is that okay? Or does that mean something is wrong somewhere? Thanks -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Feb 5 08:47:03 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 5 Feb 2007 14:47:03 +0100 Subject: [Ferret-talk] rebuild_index is returning {} In-Reply-To: <1dce587bf8965eb71032756013c7dd38@ruby-forum.com> References: <1dce587bf8965eb71032756013c7dd38@ruby-forum.com> Message-ID: <20070205134703.GL21355@cordoba.webit.de> On Mon, Feb 05, 2007 at 02:26:50PM +0100, Raj Singh wrote: > Previously when I used to build index i used to get false in return. > > >> Event.rebuild_index > => false > > Now I get this. > >> Event.rebuild_index > => {} > > Following changes took place. > > 1) I moved my app from FCGI to mongrel. > 2) I moved my app to capistrano. > > Before moving to capistrano the code was > acts_as_ferret :fields => [ "name", "desc_uf" ] > > > Now the code is > acts_as_ferret :fields => [ "name", "desc_uf" ],:index_dir => > "/home/dorelal/apps/eii_#{RAILS_ENV}/shared/ferret" > > My question is this: > Now that I do Event.rebuild_index I get {} in return. Is that okay? Or > does that mean something is wrong somewhere? As of now, the return value of rebuild_index does not mean anything and has changed between aaf versions. In 0.3.0, index.close() was the last call in the method, in 0.3.1 it's been an assignment to Hash.new, so {} looks like a correct return value for this version. However I don't see how the return value should change from false to {} without switching the aaf version. Jens PS: I promise to set the return value of rebuild_index to something meaningful (maybe the number of records indexed?) in future versions, as people really seem to pay attention to it ;-) -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From neeraj.jsr at gmail.com Mon Feb 5 10:13:46 2007 From: neeraj.jsr at gmail.com (Raj Singh) Date: Mon, 5 Feb 2007 16:13:46 +0100 Subject: [Ferret-talk] rebuild_index is returning {} In-Reply-To: <20070205134703.GL21355@cordoba.webit.de> References: <1dce587bf8965eb71032756013c7dd38@ruby-forum.com> <20070205134703.GL21355@cordoba.webit.de> Message-ID: <8cab82ee352b9f8fb94ccbb5daad3fa8@ruby-forum.com> You are right. I upgraded aaf too. Not sure what the earlier version was but when I upgraded I got 'svn revision 132' for aaf. Sorry I should have mentioned that. Thanks Jens Kraemer wrote: > On Mon, Feb 05, 2007 at 02:26:50PM +0100, Raj Singh wrote: >> >> >> My question is this: >> Now that I do Event.rebuild_index I get {} in return. Is that okay? Or >> does that mean something is wrong somewhere? > > As of now, the return value of rebuild_index does not mean anything and > has changed between aaf versions. In 0.3.0, index.close() was the last > call > in the method, in 0.3.1 it's been an assignment to Hash.new, so {} looks > like > a correct return value for this version. > > However I don't see how the return value should change from false to {} > without switching the aaf version. > > Jens > > PS: I promise to set the return value of rebuild_index to something > meaningful (maybe the number of records indexed?) in future versions, as > people really seem to pay attention to it ;-) > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 -- Posted via http://www.ruby-forum.com/. From divotdave at mac.com Tue Feb 6 14:09:57 2007 From: divotdave at mac.com (divotdave) Date: Tue, 6 Feb 2007 20:09:57 +0100 Subject: [Ferret-talk] Error : End-of-File Error occured at In-Reply-To: <7bdd7e18d9a394b1b19d82ed22166196@ruby-forum.com> References: <1567e15bddc1e640d2b2e17e3411c84f@ruby-forum.com> <742d3d03a03681833faeee146ccb259f@ruby-forum.com> <453524DC.1080800@benjaminkrause.com> <7bdd7e18d9a394b1b19d82ed22166196@ruby-forum.com> Message-ID: Raj Singh wrote: > This problem is back and now I know the pattern. > > I rebuilt the index and things started working. Then I started adding > events again to the application. After adding 60/70 events the problem > was back. I got the exception because of > > End-of-File Error occured at > > Then I rebuilt the index and it started working. Again I had the same > issue after I added 50/60 records. > > Am I missing something here. I am using ferret 0.10.9 and the latest > acts_as_ferret plugin. > > Thanks In case anybody comes across this thread...I had a similar problem, but discovered that is was happening because I was trying to write to my Model with update_attribute (updating a posting view stat) within the same method I was running the search from with AAF. The solution for me was to include a model.disable_ferret(:once) statement in the write procedure within my controller action so that AAF didn't try to update the index (to a field that didn't matter anyway, though I had explicitly excluded it from the :fields array in Model.rb) while running a search at the same time. So now my code looks like this: def search @models = Model.find_by_contents('foo') @models.each do |m| @model = Model.find(m.id) @model.disable_ferret(:once) @model.update_attribute('views', if @model.views == 0 then 1 else @views + 1 end) end end This also greatly improved my search speed. You can read more here -> http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage Hope this helps someone. -- Posted via http://www.ruby-forum.com/. From divotdave at mac.com Tue Feb 6 14:12:28 2007 From: divotdave at mac.com (divotdave) Date: Tue, 6 Feb 2007 20:12:28 +0100 Subject: [Ferret-talk] Error : End-of-File Error occured at In-Reply-To: References: <1567e15bddc1e640d2b2e17e3411c84f@ruby-forum.com> <742d3d03a03681833faeee146ccb259f@ruby-forum.com> <453524DC.1080800@benjaminkrause.com> <7bdd7e18d9a394b1b19d82ed22166196@ruby-forum.com> Message-ID: <1d062b340779620c029e28ce2b0ac350@ruby-forum.com> Ooops... > def search > @models = Model.find_by_contents('foo') > @models.each do |m| > @model = Model.find(m.id) > @model.disable_ferret(:once) > @model.update_attribute('views', if @model.views == 0 then 1 > else @views + 1 end) > end > end > > This also greatly improved my search speed. > > You can read more here -> > http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage > > Hope this helps someone. @views in the @model.update_attribute line should be @model.views... Should proof read better ;) -- Posted via http://www.ruby-forum.com/. From mischa78 at xs4all.nl Tue Feb 6 17:22:39 2007 From: mischa78 at xs4all.nl (Mischa Berger) Date: Tue, 6 Feb 2007 23:22:39 +0100 Subject: [Ferret-talk] Which method to use to get content from index with a_a_f? Message-ID: Hi everybody, After staring at the a_a_f API for quite sometime now, I decided it's time to ask... Which method should I use to get content from the index without using highlight? Consider the following controller action: def preview if params[:search].blank? # normal case @text = @myfile. # which method do I use here to get the :text from the index??? else # if we come from the search results page @text = @myfile.highlight(params[:search], { :field => :text, :excerpt_length => :all, :pre_tag => '[highlight]', :post_tag => '[/highlight]' }) end end I didn't store my text in the database, only in the index. When I'm coming from a search I use the highlight method, so the term I searched for gets highlighted, but how do I get the text from the index in a 'normal case'. The highlight method feels inappropriate, because I don't want to highlight anything. I don't see what other method to use. Thanks in advance! Mischa. -- Posted via http://www.ruby-forum.com/. From david.to at gmail.com Tue Feb 6 17:43:57 2007 From: david.to at gmail.com (dave) Date: Tue, 6 Feb 2007 23:43:57 +0100 Subject: [Ferret-talk] Getting "ArgumentError ( isn't a valid directory argume In-Reply-To: <6b7d087a8742b63941fc099bbc5b3d4b@ruby-forum.com> References: <3986844f3d800df30733372554d993a4@ruby-forum.com> <28b1b191392270b0d5545555885147df@ruby-forum.com> <20061217220707.GA21628@cordoba.webit.de> <860f8413835b75e43b8b0c866b78ad04@ruby-forum.com> <002a6391f3b20ac93d068e6ce6f998c0@ruby-forum.com> <6b7d087a8742b63941fc099bbc5b3d4b@ruby-forum.com> Message-ID: Aryk Grosz wrote: > So will the 0.10.11 version of this plugin fix this problem? Im getting > the error sporadically after the index was already built. I cant seem to > figure out whats causing it. If it only happens sporadically, I would say that it might be because the index hasn't been built. Is there content in your index dir when you encounter this? -- Posted via http://www.ruby-forum.com/. From tennisbum2002 at hotmail.com Tue Feb 6 21:28:43 2007 From: tennisbum2002 at hotmail.com (Aryk Grosz) Date: Wed, 7 Feb 2007 03:28:43 +0100 Subject: [Ferret-talk] Getting "ArgumentError ( isn't a valid directory argume In-Reply-To: References: <3986844f3d800df30733372554d993a4@ruby-forum.com> <28b1b191392270b0d5545555885147df@ruby-forum.com> <20061217220707.GA21628@cordoba.webit.de> <860f8413835b75e43b8b0c866b78ad04@ruby-forum.com> <002a6391f3b20ac93d068e6ce6f998c0@ruby-forum.com> <6b7d087a8742b63941fc099bbc5b3d4b@ruby-forum.com> Message-ID: <3c17fe5589b6d033c9f4090688267dc6@ruby-forum.com> Are you suggesting that the content is getting deleted. This happens after several successful searches using the index and then it crashes. Everytime I check, there are index files in the index folder though... dave wrote: > Aryk Grosz wrote: >> So will the 0.10.11 version of this plugin fix this problem? Im getting >> the error sporadically after the index was already built. I cant seem to >> figure out whats causing it. > > If it only happens sporadically, I would say that it might be because > the index hasn't been built. Is there content in your index dir when you > encounter this? -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Feb 7 05:48:42 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 7 Feb 2007 11:48:42 +0100 Subject: [Ferret-talk] Which method to use to get content from index with a_a_f? In-Reply-To: References: Message-ID: <20070207104842.GB20182@cordoba.webit.de> On Tue, Feb 06, 2007 at 11:22:39PM +0100, Mischa Berger wrote: > Hi everybody, > > After staring at the a_a_f API for quite sometime now, I decided it's > time to ask... > > Which method should I use to get content from the index without using > highlight? Consider the following controller action: > > def preview > if params[:search].blank? # normal case > @text = @myfile. # which method do I use here to get the :text from > the index??? > else # if we come from the search results page > @text = @myfile.highlight(params[:search], { :field => :text, > :excerpt_length => :all, :pre_tag => '[highlight]', :post_tag => > '[/highlight]' }) > end > end > > I didn't store my text in the database, only in the index. When I'm > coming from a search I use the highlight method, so the term I searched > for gets highlighted, but how do I get the text from the index in a > 'normal case'. The highlight method feels inappropriate, because I don't > want to highlight anything. I don't see what other method to use. that's because there is currently no aaf way to do this. You can however get a handle to the Ferret::Index instance by calling YourModel.ferret_index (or YourModel.aaf_index.ferret_index if you're using the latest aaf trunk). Now you can query this index instance for your primary key with something in the lines of "id:#{self.id}", and then retrieve field values via the standard ferret api. Note that this won't work with a remote index, since the index in this case is not necessarily on the same machine and therefore cannot be accessed via Ferret's own API. I already started thinking about a way to integrate this into the API, maybe via an option to find_by_contents that allows to specify the fields you want to fetch directly from the index. That could then be combined with lazy-loading of the 'real' record from DB for more speed when it comes to live-searches and the like. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From john at johnleach.co.uk Wed Feb 7 10:26:17 2007 From: john at johnleach.co.uk (John Leach) Date: Wed, 07 Feb 2007 15:26:17 +0000 Subject: [Ferret-talk] "Illegal state of TermDocEnum" error Message-ID: <1170861977.24299.10.camel@localhost.localdomain> Hi, I've upset Ferret (again). When searching for: "us military" -bomb I get the following exception: State Error occured at :79 in xraise Error occured in index.c:2089 - stde_doc_num Illegal state of TermDocEnum. You must call #next before you call #doc_num If I drop the quotes around "us military", or drop "-bomb", it works fine. I can search for -bomb on it's own, and other variations successfully. I've tried recreating the index from scratch too. The only way I've found to fix the problem is to optimize the index. I'd rather not have to do every time I add new documents. I found the following previous list post on the same subject. Dave's response suggests 0.10.9, which I tried with no improvement. http://rubyforge.org/pipermail/ferret-talk/2006-October/001669.html I've reproduced repeatedly with 0.10.9 and, 0.10.10 and 0.10.14, (with the same set of documents). Any ideas what the error even means? Thanks, John. -- http://johnleach.co.uk From miggymarley at yahoo.com Wed Feb 7 12:39:05 2007 From: miggymarley at yahoo.com (michael) Date: Wed, 7 Feb 2007 18:39:05 +0100 Subject: [Ferret-talk] Which version to install? Message-ID: <2670bb3d175a735693f324715025e943@ruby-forum.com> Hi All, I am kinda of new to this. Could you please let me know which version of Ferret and AAF to install (I am developing on Windows). I tried installing Ferret.0.10.14-beta, but it does not seems to work. The last one that works for WIN seems to be 0.10.9, is that right? Thanks in Advance! Mike -- Posted via http://www.ruby-forum.com/. From mischa78 at xs4all.nl Wed Feb 7 15:44:11 2007 From: mischa78 at xs4all.nl (Mischa Berger) Date: Wed, 7 Feb 2007 21:44:11 +0100 Subject: [Ferret-talk] Which method to use to get content from index with a_a_f In-Reply-To: <20070207104842.GB20182@cordoba.webit.de> References: <20070207104842.GB20182@cordoba.webit.de> Message-ID: <61e39cc726088fad1150a3cc3116ad60@ruby-forum.com> Thanks for the tip! I expected aaf to have a method for this, but the way you describe works fine. I'm doing it like this now: @text = Myfile.ferret_index[@myfile.document_number][:text] -- Posted via http://www.ruby-forum.com/. From mischa78 at xs4all.nl Wed Feb 7 16:04:40 2007 From: mischa78 at xs4all.nl (Mischa Berger) Date: Wed, 7 Feb 2007 22:04:40 +0100 Subject: [Ferret-talk] Getting "ArgumentError ( isn't a valid directory argume In-Reply-To: <002a6391f3b20ac93d068e6ce6f998c0@ruby-forum.com> References: <3986844f3d800df30733372554d993a4@ruby-forum.com> <28b1b191392270b0d5545555885147df@ruby-forum.com> <20061217220707.GA21628@cordoba.webit.de> <860f8413835b75e43b8b0c866b78ad04@ruby-forum.com> <002a6391f3b20ac93d068e6ce6f998c0@ruby-forum.com> Message-ID: <944b0b4950a065415e6d11a694f7241a@ruby-forum.com> > I find the problem. The method multi-search supposes that the index has > been constructed before. A solution can be to realize a find_by_contents > in all models at begin (to construct the indexes), some one knows other > better solution? (re)building the index with rebuild_index sounds more approprate to me than 'to realize a find_by_contents' HTH Mischa -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Feb 8 04:22:02 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 8 Feb 2007 10:22:02 +0100 Subject: [Ferret-talk] Which version to install? In-Reply-To: <2670bb3d175a735693f324715025e943@ruby-forum.com> References: <2670bb3d175a735693f324715025e943@ruby-forum.com> Message-ID: <20070208092202.GD20182@cordoba.webit.de> On Wed, Feb 07, 2007 at 06:39:05PM +0100, michael wrote: > Hi All, I am kinda of new to this. Could you please let me know which > version of Ferret and AAF to install (I am developing on Windows). I > tried installing Ferret.0.10.14-beta, but it does not seems to work. > The last one that works for WIN seems to be 0.10.9, is that right? I guess that's right. If you know how to do this building the latest version on win should be possible, too. As there haven't been any API changes between 0.10.9 and 0.10.14, you should be safe to use the latest acts_as_ferret version from svn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Thu Feb 8 04:23:24 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 8 Feb 2007 10:23:24 +0100 Subject: [Ferret-talk] Getting "ArgumentError ( isn't a valid directory argume In-Reply-To: <944b0b4950a065415e6d11a694f7241a@ruby-forum.com> References: <3986844f3d800df30733372554d993a4@ruby-forum.com> <28b1b191392270b0d5545555885147df@ruby-forum.com> <20061217220707.GA21628@cordoba.webit.de> <860f8413835b75e43b8b0c866b78ad04@ruby-forum.com> <002a6391f3b20ac93d068e6ce6f998c0@ruby-forum.com> <944b0b4950a065415e6d11a694f7241a@ruby-forum.com> Message-ID: <20070208092323.GE20182@cordoba.webit.de> On Wed, Feb 07, 2007 at 10:04:40PM +0100, Mischa Berger wrote: > > I find the problem. The method multi-search supposes that the index has > > been constructed before. A solution can be to realize a find_by_contents > > in all models at begin (to construct the indexes), some one knows other > > better solution? > > (re)building the index with rebuild_index sounds more approprate to me > than 'to realize a find_by_contents' yeah, however find_by_contents implicitly rebuilds the index if none exists ;-) Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From pumpkingod at gmail.com Fri Feb 9 08:05:48 2007 From: pumpkingod at gmail.com (pumpkin) Date: Fri, 9 Feb 2007 14:05:48 +0100 Subject: [Ferret-talk] [AAF] Determine matching field in a search result Message-ID: <4b7b087c3030d36e5d53708613a563e5@ruby-forum.com> I was wondering if it was possible to determine which field caused the match to occur (or contributed the most to the score) in a search result. For example, if I had a record { :name => 'pumpkin', :age => -5, :address => 'Via Mandriola' } and I did a search for 'Mandriola', is it possible for acts_as_ferret to tell me that the match actually came from the :address field? In some cases there will be several matching fields, but I feel it should be possible. Thank you :) -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Feb 9 08:39:12 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 9 Feb 2007 14:39:12 +0100 Subject: [Ferret-talk] [AAF] Determine matching field in a search result In-Reply-To: <4b7b087c3030d36e5d53708613a563e5@ruby-forum.com> References: <4b7b087c3030d36e5d53708613a563e5@ruby-forum.com> Message-ID: <20070209133912.GM20182@cordoba.webit.de> On Fri, Feb 09, 2007 at 02:05:48PM +0100, pumpkin wrote: > I was wondering if it was possible to determine which field caused the > match to occur (or contributed the most to the score) in a search > result. > > For example, if I had a record > > { :name => 'pumpkin', :age => -5, :address => 'Via Mandriola' } > > and I did a search for 'Mandriola', is it possible for acts_as_ferret to > tell me that the match actually came from the :address field? In some > cases there will be several matching fields, but I feel it should be > possible. I know of no way to do this via Ferret's API, so aaf doesn't support this either. You can run explain on a query, but that more or less does the searching again and generates a textual output that shows how the score has been calculated. However, if you only want to know which field the hit came from, just do a search for "+id:#{record.id} +address:#{query}" and repeat that for every other field you have indexed. If you call total_hits and not find_by_contents with these queries and check if the result is 0 or 1, this even should be quite fast. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From jenny_alohaz at yahoo.com Fri Feb 9 12:28:43 2007 From: jenny_alohaz at yahoo.com (jen) Date: Fri, 9 Feb 2007 18:28:43 +0100 Subject: [Ferret-talk] Sorting/Ordering Search Results In-Reply-To: <627fa564561099624485951a58b748f7@ruby-forum.com> References: <44aaf847f32d69bcdcdb71af3bf1d1bd@ruby-forum.com> <459C01DC.8040300@benjaminkrause.com> <13155345b60df7c0b4e24d4fde8f5d21@ruby-forum.com> <022669AA-F090-474D-A44F-2B368178ED82@benjaminkrause.com> <627fa564561099624485951a58b748f7@ruby-forum.com> Message-ID: <997f4310c462740ef6bb7dd60c62d82f@ruby-forum.com> Sean Osh wrote: > Benjamin Krause wrote: >> Hey... >> >> just checked back with david about that sorting issue.. The problem >> on my index was, that you cannot sort by fields that you've indexed. >> you need to store the fields untokenized if you want to sort be them. >> maybe that'll fix your problem as well? >> >> Ben > > Yeah that did it! Thanks for all the help! Hi, I have a field named 'title'. If I tokenize this field then I'm able to search it by keywords but not able to sort it. However if I untokenize so that sorting works, then the search fails. Its a dilemma but I'm sure there must be some solution to this problem, right? It can't be that for any particular field I have to choose between whether I want to search it or sort by it. Thanks. Jen -- Posted via http://www.ruby-forum.com/. From andreas.korth at gmx.net Fri Feb 9 12:57:56 2007 From: andreas.korth at gmx.net (Andreas Korth) Date: Fri, 9 Feb 2007 18:57:56 +0100 Subject: [Ferret-talk] Sorting/Ordering Search Results In-Reply-To: <997f4310c462740ef6bb7dd60c62d82f@ruby-forum.com> References: <44aaf847f32d69bcdcdb71af3bf1d1bd@ruby-forum.com> <459C01DC.8040300@benjaminkrause.com> <13155345b60df7c0b4e24d4fde8f5d21@ruby-forum.com> <022669AA-F090-474D-A44F-2B368178ED82@benjaminkrause.com> <627fa564561099624485951a58b748f7@ruby-forum.com> <997f4310c462740ef6bb7dd60c62d82f@ruby-forum.com> Message-ID: <58647320-ACD0-40E0-8A05-794521B25963@gmx.net> On Feb 9, 2007, at 6:28 PM, jen wrote: > I have a field named 'title'. > If I tokenize this field then I'm able to search it by keywords but > not > able to sort it. However if I untokenize so that sorting works, then > the search fails. > > Its a dilemma but I'm sure there must be some solution to this > problem, > right? > > It can't be that for any particular field I have to choose between > whether I want to search it or sort by it. What about FieldInfo.new(:title, :index => :yes, :store => yes) This should store the field in it's original format while indexing it tokenized. If this doesn't work, I'd consider it a bug since the documentation suggests that :store and :index are independent options. At least, it doesn't state otherwise. Setting a certain value for :store should not conflict or interfere with any value for :index. If this is indeed not working as expected, you could use two fields, one which you store untokenized (for sorting) and one which you don't store but index tokenized. That'd wouldn't be an elegant solution but a feasible one. Cheers, Andreas From mark at mark.com Fri Feb 9 19:00:46 2007 From: mark at mark.com (Mark) Date: Sat, 10 Feb 2007 01:00:46 +0100 Subject: [Ferret-talk] Ferret and Paginating Find Message-ID: <6cbfd1494cab68ee504fd9091f6a36aa@ruby-forum.com> Hey all, I've been really happy with ferret thus far and all my search on my site is based on it. One of the recent challenges I ran into is changing some of my pagination within my site. Until now, I just used the tutorials out there that talk about how to get pagination working with acts_as_ferret. Recently, I decided to change my pagination to begin using the "Paginating Find" plugin. http://cardboardrocket.com/pages/paginating_find I'm using the "Paginating Find" plugin and combining it with the following guide to get Digg.com style pagination looking links. http://www.igvita.com/blog/2006/09/10/faster-pagination-in-rails/ For most of my site, that works fine, but I haven't been too successful in getting it to work with acts_as_ferret and I'm not quite sure where to start. Anyone tried this out before or have any ideas that might point me in the right direction? Thanks in advance! -- Posted via http://www.ruby-forum.com/. From andreas.korth at gmx.net Fri Feb 9 19:36:49 2007 From: andreas.korth at gmx.net (Andreas Korth) Date: Sat, 10 Feb 2007 01:36:49 +0100 Subject: [Ferret-talk] Ferret and Paginating Find In-Reply-To: <6cbfd1494cab68ee504fd9091f6a36aa@ruby-forum.com> References: <6cbfd1494cab68ee504fd9091f6a36aa@ruby-forum.com> Message-ID: On Feb 10, 2007, at 1:00 AM, Mark wrote: > I've been really happy with ferret thus far and all my search on my > site > is based on it. One of the recent challenges I ran into is changing > some > of my pagination within my site. Until now, I just used the tutorials > out there that talk about how to get pagination working with > acts_as_ferret. Here's a thread on pagination with acts_as_ferret: http://www.ruby-forum.com/topic/64033 HTH, Andreas From samuelgiffney at gmail.com Sat Feb 10 00:03:47 2007 From: samuelgiffney at gmail.com (Sam) Date: Sat, 10 Feb 2007 06:03:47 +0100 Subject: [Ferret-talk] Adding entry breaks index Message-ID: <864bef133f5a17ad21e5f5a02d70ffca@ruby-forum.com> Our ferret 0.10.13 index has been slowly growing on our debian server and has just got up over 14,000 records. Yesterday I randomly noticed that one search I did was suddenly giving whack, unexpected results. I have spent much time trying to track the problem. Tried ferret 0.10.9 - no change. Tried on a windows machine - where it works fine, and doesn't give weird results (which just adds to the strangeness - anyway I need it to work on the debian server) narrowed it down to one single entry that when you add or delete from the index completely changes results in unrelated searches. a little console output shows this best. index = Ferret::Index::Index.new(FerretConfig::INDEXOPTIONS) puts index.search("westpac").total_hits 286 puts index.search("westpac branch").total_hits 277 doc = Entry.find(1094481).make_entry_ferret_doc => {:latitude1d=>"36.9", :address=>"61 Remuera Rd, Newmarket", :longitude1d=>"174.8", :name=>"Spiro's Florists", :precision=>"1 number", :tags=>"Flowers, bouquets, gift baskets, permanent floral arrangements, inter-flora", :zid=>1094481} index << doc index.flush index.optimize puts index.search("westpac").total_hits 286 puts index.search("westpac branch").total_hits 3 index.delete("1094481") index.flush index.optimize puts index.search("westpac").total_hits 286 puts index.search("westpac branch").total_hits 277 I'm completely lost on this. It makes no sense to me at all. Rebuilding the index doesn't help. It happens the same on 2 similar but independent debian boxes. Anyone got any clues as to where to start? While it's fine to just remove this entry and presume everything is working - without knowing why this breaks it's pretty hard to have faith in the index not breaking again... Really appreciate any thoughts, Sam -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Sat Feb 10 03:55:27 2007 From: kraemer at webit.de (Jens Kraemer) Date: Sat, 10 Feb 2007 09:55:27 +0100 Subject: [Ferret-talk] Adding entry breaks index In-Reply-To: <864bef133f5a17ad21e5f5a02d70ffca@ruby-forum.com> References: <864bef133f5a17ad21e5f5a02d70ffca@ruby-forum.com> Message-ID: <20070210085527.GA9589@cordoba.webit.de> On Sat, Feb 10, 2007 at 06:03:47AM +0100, Sam wrote: > Our ferret 0.10.13 index has been slowly growing on our debian server > and has just got up over 14,000 records. Yesterday I randomly noticed > that one search I did was suddenly giving whack, unexpected results. I > have spent much time trying to track the problem. > > Tried ferret 0.10.9 - no change. > Tried on a windows machine - where it works fine, and doesn't give weird > results (which just adds to the strangeness - anyway I need it to work > on the debian server) could you try Ferret 0.10.14? > narrowed it down to one single entry that when you add or delete from > the index completely changes results in unrelated searches. > a little console output shows this best. > > index = Ferret::Index::Index.new(FerretConfig::INDEXOPTIONS) > > puts index.search("westpac").total_hits > 286 > puts index.search("westpac branch").total_hits > 277 > > doc = Entry.find(1094481).make_entry_ferret_doc > => {:latitude1d=>"36.9", :address=>"61 Remuera Rd, Newmarket", > :longitude1d=>"174.8", :name=>"Spiro's Florists", :precision=>"1 > number", :tags=>"Flowers, bouquets, gift baskets, permanent floral > arrangements, inter-flora", :zid=>1094481} > index << doc > index.flush > index.optimize > > puts index.search("westpac").total_hits > 286 > puts index.search("westpac branch").total_hits > 3 really strange. To further track this down I'd try with variations of this record, i.e. leave one field empty, then the other to find out which field's value is causing this problem. btw, what number of hits does index.search("branch").total_hits yield with/without that record? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Sat Feb 10 04:19:14 2007 From: kraemer at webit.de (Jens Kraemer) Date: Sat, 10 Feb 2007 10:19:14 +0100 Subject: [Ferret-talk] Ferret and Paginating Find In-Reply-To: <6cbfd1494cab68ee504fd9091f6a36aa@ruby-forum.com> References: <6cbfd1494cab68ee504fd9091f6a36aa@ruby-forum.com> Message-ID: <20070210091914.GB9589@cordoba.webit.de> On Sat, Feb 10, 2007 at 01:00:46AM +0100, Mark wrote: > Hey all, > > I've been really happy with ferret thus far and all my search on my site > is based on it. One of the recent challenges I ran into is changing some > of my pagination within my site. Until now, I just used the tutorials > out there that talk about how to get pagination working with > acts_as_ferret. > > Recently, I decided to change my pagination to begin using the > "Paginating Find" plugin. > > http://cardboardrocket.com/pages/paginating_find as this hooks into AR's find method, you'd have to issue the active record find() call to retrieve the result set by yourself and let aaf just handle the pure ferret search by using find_id_by_contents instead of find_by_contents. However any sorting done by Ferret (by score or any other sorting by Ferret field) will be lost that way. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Sat Feb 10 04:39:32 2007 From: kraemer at webit.de (Jens Kraemer) Date: Sat, 10 Feb 2007 10:39:32 +0100 Subject: [Ferret-talk] Sorting/Ordering Search Results In-Reply-To: <58647320-ACD0-40E0-8A05-794521B25963@gmx.net> References: <44aaf847f32d69bcdcdb71af3bf1d1bd@ruby-forum.com> <459C01DC.8040300@benjaminkrause.com> <13155345b60df7c0b4e24d4fde8f5d21@ruby-forum.com> <022669AA-F090-474D-A44F-2B368178ED82@benjaminkrause.com> <627fa564561099624485951a58b748f7@ruby-forum.com> <997f4310c462740ef6bb7dd60c62d82f@ruby-forum.com> <58647320-ACD0-40E0-8A05-794521B25963@gmx.net> Message-ID: <20070210093932.GC9589@cordoba.webit.de> On Fri, Feb 09, 2007 at 06:57:56PM +0100, Andreas Korth wrote: > > On Feb 9, 2007, at 6:28 PM, jen wrote: > > > I have a field named 'title'. > > If I tokenize this field then I'm able to search it by keywords but > > not > > able to sort it. However if I untokenize so that sorting works, then > > the search fails. > > > > Its a dilemma but I'm sure there must be some solution to this > > problem, > > right? > > > > It can't be that for any particular field I have to choose between > > whether I want to search it or sort by it. > > What about > > FieldInfo.new(:title, :index => :yes, :store => yes) > > This should store the field in it's original format while indexing it > tokenized. > > If this doesn't work, I'd consider it a bug since the documentation > suggests that :store and :index are independent options. At least, it > doesn't state otherwise. Setting a certain value for :store should > not conflict or interfere with any value for :index. :store does not interfere with the value given for :index, but as I understand the docs, storing a field's contents doesn't help with sorting either. From the docs only the :index option influences the ability to sort. Imho storing a field's content is completely independent from (un)tokenized indexing, and the sorting is done on the indexed values, not on the stored ones. > If this is indeed not working as expected, you could use two fields, > one which you store untokenized (for sorting) and one which you don't > store but index tokenized. That'd wouldn't be an elegant solution but > a feasible one. That's the common solution. There's no need to use :store => :yes for any of these fields, unless you have any use for the stored original field content. so you'd have: # the field for search FieldInfo.new(:title, :index => :yes, :store => whatever you want) # the field for sorting, leaving out any info not needed for sorting FieldInfo.new(:sortable_title, :index => :untokenized, :store => :no, :term_vector => :no) Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From ed.temp.01 at gmail.com Sat Feb 10 06:29:27 2007 From: ed.temp.01 at gmail.com (Ed Ed) Date: Sat, 10 Feb 2007 12:29:27 +0100 Subject: [Ferret-talk] Adding extra fields to an index (using RDig?) Message-ID: Hello everyone, I am writing an application which collects a set of web sites and caches them locally for offline viewing. I want to do searches on this collection and associate extra data with each result (e.g date collected, reason for collection, perhaps a sequence number). Now all this data exists when the harvesting is done and could be stored in a database. I want to use RDig to index my collection of sites I also want to associate the index results with my extra data and display them along with search results. The index is built once and searched many times so I want searching to be as efficient as possible. The simplest way is to use e.g. the local URL as a key into my database (easy but needs to be done each time and could slow things down) Is it possible to add extra fields to ferret index entries? If so, can this be done at create time or must it be done afterwards? If it can be done at create time is there a way to get RDig to insert these extra fields? Thanks for any help with this Ed -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Sat Feb 10 12:33:50 2007 From: kraemer at webit.de (Jens Kraemer) Date: Sat, 10 Feb 2007 18:33:50 +0100 Subject: [Ferret-talk] Adding extra fields to an index (using RDig?) In-Reply-To: References: Message-ID: <20070210173350.GA22582@cordoba.webit.de> Hi! On Sat, Feb 10, 2007 at 12:29:27PM +0100, Ed Ed wrote: > Hello everyone, > > I am writing an application which collects a set of web sites and caches > them locally for offline viewing. I want to do searches on this > collection and associate extra data with each result (e.g date > collected, reason for collection, perhaps a sequence number). > > Now all this data exists when the harvesting is done and could be stored > in a database. I want to use RDig to index my collection of sites I also > want to associate the index results with my extra data and display them > along with search results. > > The index is built once and searched many times so I want searching to > be as efficient as possible. > > The simplest way is to use e.g. the local URL as a key into my database > (easy but needs to be done each time and could slow things down) > > Is it possible to add extra fields to ferret index entries? of course that is possible, RDig itself uses three different fields - :url, :title and :data. > If so, can this be done at create time or must it be done afterwards? If > it can be done at create time is there a way to get RDig to insert these > extra fields? Ferret documents cannot be modified after they have been created, so any custom fields you want to add have to be added when the index is created. Atm RDig doesn't support custom fields, however I'd be happy to apply a patch adding this capability ;-) cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From samuelgiffney at gmail.com Sat Feb 10 14:58:27 2007 From: samuelgiffney at gmail.com (Sam) Date: Sat, 10 Feb 2007 20:58:27 +0100 Subject: [Ferret-talk] Adding entry breaks index In-Reply-To: <20070210085527.GA9589@cordoba.webit.de> References: <864bef133f5a17ad21e5f5a02d70ffca@ruby-forum.com> <20070210085527.GA9589@cordoba.webit.de> Message-ID: Everything happens the same with 0.10.14 index.search("branch").total_hits is constant at 811 through all tests I guessed that it was something to do with the tags field, removing it before adding the doc made everything ok - so I played with changing the values in the tags field. I narrowed it down to this. If tags is or contains any of the follwing words baskets basket ba ball baloney basketcase babaracchus then the search numbers for westpac branch drop from 277 to 3 if tags is any of b ba baracchus then the search numbers for westpac branch stay at 277 Looks like even the A-team can't help me... -- Posted via http://www.ruby-forum.com/. From ed.temp.01 at gmail.com Sun Feb 11 13:17:51 2007 From: ed.temp.01 at gmail.com (Ed Ed) Date: Sun, 11 Feb 2007 19:17:51 +0100 Subject: [Ferret-talk] Adding extra fields to an index (using RDig?) In-Reply-To: <20070210173350.GA22582@cordoba.webit.de> References: <20070210173350.GA22582@cordoba.webit.de> Message-ID: Hi, To summarise, I can add custom fields at create time but not afterwards. Furthermore RDig does not presently support the addition of custom fields. Please could you post your patch to enable RDig to support custom fields. Thanks Ed Jens Kraemer wrote: > Hi! > > On Sat, Feb 10, 2007 at 12:29:27PM +0100, Ed Ed wrote: > > > Ferret documents cannot be modified after they have been created, so any > custom fields you want to add have to be added when the index is > created. > > Atm RDig doesn't support custom fields, however I'd be happy to apply a > patch adding this capability ;-) > > -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Feb 12 04:01:51 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 12 Feb 2007 10:01:51 +0100 Subject: [Ferret-talk] Adding extra fields to an index (using RDig?) In-Reply-To: References: <20070210173350.GA22582@cordoba.webit.de> Message-ID: <20070212090151.GO20182@cordoba.webit.de> On Sun, Feb 11, 2007 at 07:17:51PM +0100, Ed Ed wrote: > Hi, > > To summarise, I can add custom fields at create time but not afterwards. > Furthermore RDig does not presently support the addition of custom > fields. Right. > > Please could you post your patch to enable RDig to support custom > fields. oh, what I wanted to say is that if *you* built such a feature into RDig, I'd be happy to integrate it. Sorry if I've been unclear here. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From ed.temp.01 at gmail.com Mon Feb 12 06:55:54 2007 From: ed.temp.01 at gmail.com (Ed Ed) Date: Mon, 12 Feb 2007 12:55:54 +0100 Subject: [Ferret-talk] Adding extra fields to an index (using RDig?) In-Reply-To: <20070212090151.GO20182@cordoba.webit.de> References: <20070210173350.GA22582@cordoba.webit.de> <20070212090151.GO20182@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > oh, what I wanted to say is that if *you* built such a feature into > RDig, I'd be happy to integrate it. Sorry if I've been unclear here. > :-( OK, I'll have a look at the code and see what might be simplest. Seems to me that adding an extra optional directive to the configuration file is easiest. This could name a file containing a user-supplied hook which rdig/indexer.rb could try to include. Or just define the hook procedure in the config file? Then if the hook procedure existed the indexer could pass it the document and doc data structure and the hook procedure could augment the doc structure as required. I guess the only Ferret requirement here is that the hook must add the same set of extra fields to each document (even if values NULL) Ed -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Feb 12 07:49:53 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 12 Feb 2007 13:49:53 +0100 Subject: [Ferret-talk] Adding extra fields to an index (using RDig?) In-Reply-To: References: <20070210173350.GA22582@cordoba.webit.de> <20070212090151.GO20182@cordoba.webit.de> Message-ID: <20070212124953.GS20182@cordoba.webit.de> On Mon, Feb 12, 2007 at 12:55:54PM +0100, Ed Ed wrote: [..] > > OK, I'll have a look at the code and see what might be simplest. Seems > to me that adding an extra optional directive to the configuration file > is easiest. This could name a file containing a user-supplied hook which > rdig/indexer.rb could try to include. Or just define the hook procedure > in the config file? defining the hook method in the config sounds good. > Then if the hook procedure existed the indexer could pass it the > document and doc data structure and the hook procedure could augment the > doc structure as required. exactly. > I guess the only Ferret requirement here is that the hook must add the > same set of extra fields to each document (even if values NULL) not even that, you can have different ferret documents with a different set of fields. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From starburger234 at yahoo.de Mon Feb 12 13:05:36 2007 From: starburger234 at yahoo.de (starburger) Date: Mon, 12 Feb 2007 19:05:36 +0100 Subject: [Ferret-talk] Invalid char problem Message-ID: <0561a6963a56194cb5204c209e055410@ruby-forum.com> Has the problem with corrupted .rhtmls (invalid characters) been solved so far? I would like to use ferret and acts_as_ferret on Windows XP. I have installed 0.10.9 (mswin32) which still seems to have the problem. I am receiving error messages like: compile error C:/INSTAN~1.4P1/INSTAN~1/rails_apps/travelogue/config/../app/views/article_editor/_header_read.rhtml:2: parse error, unexpected ')', expecting kEND C:/INSTAN~1.4P1/INSTAN~1/rails_apps/travelogue/config/../app/views/article_editor/_header_read.rhtml:3: parse error, unexpected tIDENTIFIER, expecting kEND _erbout.concat "
"?; _erbout.concat(( @article.title ).to_s); _erbout.concat "
\n" ^ C:/INSTAN~1.4P1/INSTAN~1/rails_apps/travelogue/config/../app/views/article_editor/_header_read.rhtml:4: Invalid char `\001' in expression C:/INSTAN~1.4P1/INSTAN~1/rails_apps/travelogue/config/../app/views/article_editor/_header_read.rhtml:5: Invalid char `\377' in expression C:/INSTAN~1.4P1/INSTAN~1/rails_apps/travelogue/config/../app/views/article_editor/_header_read.rhtml:7: parse error, unexpected $, expecting kEND _erbout.concat " " ^ Would compiling the ferret extensions by myself be a solution? I have Visual C++ Express Edition installed. -- Posted via http://www.ruby-forum.com/. From wmorgan-ferret at masanjin.net Mon Feb 12 18:37:56 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Mon, 12 Feb 2007 15:37:56 -0800 Subject: [Ferret-talk] [ANN] sup 0.0.7 Released Message-ID: <1171323363-sup-5173@south> And now, for some news about the only Ferret project that doesn't involve Rails: sup version 0.0.7 has been released! http://sup.rubyforge.org Sup is a console-based email client that combines the best features of GMail, mutt, and emacs. Sup matches the power of GMail with the speed and simplicity of a console interface. Sup makes it easy to: - Handle massive amounts of email. - Mix email from different sources: mbox files (even across different machines), IMAP folders, POP accounts, and GMail accounts. Changes: == 0.0.7 / 2007-02-12 * Split sup-import into two bits: sup-import and sup-add. * Command-line arguments now handled by trollop. * Better error handling for IMAP and svn+ssh. * Messages can now be moved between sources while preserving all message state. * New commands in thread-view-mode: - 'a' to add an email to the addressbook - 'S' to search for all email to/from an email address - 'A' to kill buffer and archive thread in one swell foop * Removed hoe dependency. -- William From anilmrn at yahoo.com Mon Feb 12 19:13:11 2007 From: anilmrn at yahoo.com (anilmrn at yahoo.com) Date: 12 Feb 2007 16:13:11 -0800 Subject: [Ferret-talk] Join Anil M on Yahoo! Messenger! Message-ID: <20070213001847.969B6524206D@rubyforge.org> Anil M wants to talk with you using the new Yahoo! Messenger: Accept the invitation by clicking this link: http://invite.msg.yahoo.com/invite?op=accept&intl=us&sig=2gNqRcKiI_5YIw9KrCnjycuK9BRUglMniwqno2tC_8HbaQpr7z0v2F81V.99K.mLQbTo9eE8QolsSZhC1y0mlMiD5yqOhzAlkE9Lt79QAn25Oj_ieQQzXSMfdzo- With Yahoo! Messenger, you get: Free worldwide PC-to-PC calls.* All you need are speakers and a microphone (or a headset). If no one's there, leave a voicemail! IM Windows Live™ Messenger friends too. Add your Windows Live friends to your Yahoo! contact list. See when they're online and IM them anytime. Stealth settings keep you in control. Now you can get in touch on your time, by controlling who sees when you're online. So what are you waiting for? It's free. Get Yahoo! Messenger and start connecting how you want, when you want. * Emergency 911 calling services not available on Yahoo! Messenger. Please inform others who use your Yahoo! Messenger they must dial 911 through traditional phone lines or cell carriers. By using Yahoo! Messenger you agree to not use PC-to-PC calling in countries where prohibited. The above features apply to the Windows version of Yahoo! Messenger. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070212/a917708d/attachment.html From dbalmain.ml at gmail.com Mon Feb 12 21:18:37 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 13 Feb 2007 13:18:37 +1100 Subject: [Ferret-talk] Adding entry breaks index In-Reply-To: <864bef133f5a17ad21e5f5a02d70ffca@ruby-forum.com> References: <864bef133f5a17ad21e5f5a02d70ffca@ruby-forum.com> Message-ID: Hi Sam, Do you think it would be possible to send me a copy of the index (if the data isn't sensitve)? It would be really helpful as I can't seem to reproduce the problem. I'm on Ubuntu here so I should be able to replicate the problem with the index. Cheers, Dave On 2/10/07, Sam wrote: > Our ferret 0.10.13 index has been slowly growing on our debian server > and has just got up over 14,000 records. Yesterday I randomly noticed > that one search I did was suddenly giving whack, unexpected results. I > have spent much time trying to track the problem. > > Tried ferret 0.10.9 - no change. > Tried on a windows machine - where it works fine, and doesn't give weird > results (which just adds to the strangeness - anyway I need it to work > on the debian server) > > narrowed it down to one single entry that when you add or delete from > the index completely changes results in unrelated searches. > a little console output shows this best. > > index = Ferret::Index::Index.new(FerretConfig::INDEXOPTIONS) > > puts index.search("westpac").total_hits > 286 > puts index.search("westpac branch").total_hits > 277 > > doc = Entry.find(1094481).make_entry_ferret_doc > => {:latitude1d=>"36.9", :address=>"61 Remuera Rd, Newmarket", > :longitude1d=>"174.8", :name=>"Spiro's Florists", :precision=>"1 > number", :tags=>"Flowers, bouquets, gift baskets, permanent floral > arrangements, inter-flora", :zid=>1094481} > index << doc > index.flush > index.optimize > > puts index.search("westpac").total_hits > 286 > puts index.search("westpac branch").total_hits > 3 > > index.delete("1094481") > index.flush > index.optimize > > puts index.search("westpac").total_hits > 286 > puts index.search("westpac branch").total_hits > 277 > > I'm completely lost on this. It makes no sense to me at all. > Rebuilding the index doesn't help. It happens the same on 2 similar but > independent debian boxes. > > Anyone got any clues as to where to start? > While it's fine to just remove this entry and presume everything is > working - without knowing why this breaks it's pretty hard to have faith > in the index not breaking again... > > Really appreciate any thoughts, > Sam > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Dave Balmain http://www.davebalmain.com/ From bk at benjaminkrause.com Mon Feb 12 21:27:33 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Tue, 13 Feb 2007 03:27:33 +0100 Subject: [Ferret-talk] Adding entry breaks index In-Reply-To: References: <864bef133f5a17ad21e5f5a02d70ffca@ruby-forum.com> <20070210085527.GA9589@cordoba.webit.de> Message-ID: <5697D9CF-2DF4-49A5-87BB-55E2C156C021@benjaminkrause.com> Hey Sam, dave said he is going to look into this in the near future.. We'll hopefully get some information about your problem soon. Ben > I guessed that it was something to do with the tags field, removing it > before adding the doc made everything ok - so I played with > changing the > values in the tags field. I narrowed it down to this. From samuelgiffney at gmail.com Tue Feb 13 03:36:47 2007 From: samuelgiffney at gmail.com (Sam) Date: Tue, 13 Feb 2007 09:36:47 +0100 Subject: [Ferret-talk] Adding entry breaks index In-Reply-To: References: <864bef133f5a17ad21e5f5a02d70ffca@ruby-forum.com> Message-ID: <679deb4b5ce0e1eddeea608fc2c8ce34@ruby-forum.com> It's open source data so no problem there. Index sent off list... -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue Feb 13 03:41:56 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 13 Feb 2007 09:41:56 +0100 Subject: [Ferret-talk] Invalid char problem In-Reply-To: <0561a6963a56194cb5204c209e055410@ruby-forum.com> References: <0561a6963a56194cb5204c209e055410@ruby-forum.com> Message-ID: <20070213084156.GB5563@cordoba.webit.de> Hi! On Mon, Feb 12, 2007 at 07:05:36PM +0100, starburger wrote: > Has the problem with corrupted .rhtmls (invalid characters) been solved > so far? > > I would like to use ferret and acts_as_ferret on Windows XP. I have > installed 0.10.9 (mswin32) which still seems to have the problem. Afair the safest bet was to not use Tabs in your rhtml templates. I don't know if there ever was a 'real' solution to this problem. Jens > I am receiving error messages like: > > compile error > C:/INSTAN~1.4P1/INSTAN~1/rails_apps/travelogue/config/../app/views/article_editor/_header_read.rhtml:2: > parse error, unexpected ')', expecting kEND > C:/INSTAN~1.4P1/INSTAN~1/rails_apps/travelogue/config/../app/views/article_editor/_header_read.rhtml:3: > parse error, unexpected tIDENTIFIER, expecting kEND > _erbout.concat "
"?; _erbout.concat(( > @article.title ).to_s); _erbout.concat "
\n" > ^ > C:/INSTAN~1.4P1/INSTAN~1/rails_apps/travelogue/config/../app/views/article_editor/_header_read.rhtml:4: > Invalid char `\001' in expression > C:/INSTAN~1.4P1/INSTAN~1/rails_apps/travelogue/config/../app/views/article_editor/_header_read.rhtml:5: > Invalid char `\377' in expression > C:/INSTAN~1.4P1/INSTAN~1/rails_apps/travelogue/config/../app/views/article_editor/_header_read.rhtml:7: > parse error, unexpected $, expecting kEND > _erbout.concat " " > ^ > > Would compiling the ferret extensions by myself be a solution? I have > Visual C++ Express Edition installed. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Tue Feb 13 12:32:19 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 14 Feb 2007 04:32:19 +1100 Subject: [Ferret-talk] Adding entry breaks index In-Reply-To: <679deb4b5ce0e1eddeea608fc2c8ce34@ruby-forum.com> References: <864bef133f5a17ad21e5f5a02d70ffca@ruby-forum.com> <679deb4b5ce0e1eddeea608fc2c8ce34@ruby-forum.com> Message-ID: On 2/13/07, Sam wrote: > It's open source data so no problem there. Index sent off list... Thanks Sam, problem fixed. Ben emailed me privately about this bug suggesting that it might be serious. He was quite correct. When I put out the fix for this it will require everyone to rebuild their indexes. I'm going to add another fix to get rid of the FileNotFound bug that a lot of people have been getting (yes, I've finally found the cause of this one) and then I'll put another release out. I was going to make this change backwards compatible but since their is a bug in the current index format and everyone will need to rebuild anyway, I guess it probably isn't necessary. If anyone can't rebuild their indexes for some reason, please let me know and I'll try and come up with a solution before I put the next release out. Once these fixes are out and I'm happy I haven't introduced any new bugs I'll be releasing Ferret 1.0 so look out for it. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From srackham at methods.co.nz Tue Feb 13 20:37:30 2007 From: srackham at methods.co.nz (Stuart Rackham) Date: Wed, 14 Feb 2007 02:37:30 +0100 Subject: [Ferret-talk] acts as ferret creates redundant default index directory? Message-ID: <1ed3c54483447e129f726caadcefc877@ruby-forum.com> Hi I'm using a non-default index directory (acts_as_ferret :index_dir option) but the unused default (./index/) is still created at load time. The culprits are the two calls to ensure_directory in the init_index_basedir method in acts_as_ferret.rb that seem to be redundant. Removing the calls to init_index_basedir fixes the problem with no apparent side effects. Cheers, Stuart -- Posted via http://www.ruby-forum.com/. From samuelgiffney at gmail.com Wed Feb 14 20:43:09 2007 From: samuelgiffney at gmail.com (Sam Giffney) Date: Thu, 15 Feb 2007 02:43:09 +0100 Subject: [Ferret-talk] Adding entry breaks index In-Reply-To: References: <864bef133f5a17ad21e5f5a02d70ffca@ruby-forum.com> <679deb4b5ce0e1eddeea608fc2c8ce34@ruby-forum.com> Message-ID: <7d72ff5eaaef8e5f681ef6f6d1ffef73@ruby-forum.com> David Balmain wrote: > Once these fixes are out and I'm happy I haven't introduced any new > bugs I'll be releasing Ferret 1.0 so look out for it. Awesome Dave! Lovely to have you back. -- Posted via http://www.ruby-forum.com/. From julioody at gmail.com Wed Feb 14 22:04:16 2007 From: julioody at gmail.com (Julio Cesar Ody) Date: Thu, 15 Feb 2007 14:04:16 +1100 Subject: [Ferret-talk] wildcard fields Message-ID: Hey all, is there a way to wildcard field searches? As in: - a document like {:title => 'foo', :description1 => 'bar', :description2 => 'bar2'} I'd search: index.search("description*: search query") I understand the example above is silly, but it's enough to make the question understandable :-) Thanks. -- Julio C. Ody http://rootshell.be/~julioody From bk at benjaminkrause.com Thu Feb 15 02:24:57 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Thu, 15 Feb 2007 08:24:57 +0100 Subject: [Ferret-talk] wildcard fields In-Reply-To: References: Message-ID: On 2007-02-15, at 04:04, Julio Cesar Ody wrote: > Hey all, > > is there a way to wildcard field searches? As in: > > - a document like {:title => 'foo', :description1 => 'bar', > :description2 => 'bar2'} > > I'd search: > > index.search("description*: search query") > > I understand the example above is silly, but it's enough to make the > question understandable :-) no, its not possible, however, you can of course combine these queries, like searching for: description1|description2: search query Ben From ed.temp.01 at gmail.com Thu Feb 15 08:06:51 2007 From: ed.temp.01 at gmail.com (Ed Ed) Date: Thu, 15 Feb 2007 14:06:51 +0100 Subject: [Ferret-talk] Proximity searching in rdig ferret Message-ID: Lucene has a syntax "foo bar"~10 for finding foo within 10 words of bar. Does ferret support this feature? (the ~ is used for fuzzy queries) Does rdig? This could be a deal breaker for me 'cos I really need proximity searches -- Posted via http://www.ruby-forum.com/. From ed.temp.01 at gmail.com Thu Feb 15 08:10:47 2007 From: ed.temp.01 at gmail.com (Ed Ed) Date: Thu, 15 Feb 2007 14:10:47 +0100 Subject: [Ferret-talk] Proximity searching in rdig ferret In-Reply-To: References: Message-ID: <67adfd7b1f645550977adac3d65c3f22@ruby-forum.com> Of course it works - stupid boy! Ed Ed wrote: > Lucene has a syntax "foo bar"~10 for finding foo within 10 words of bar. > > Does ferret support this feature? (the ~ is used for fuzzy queries) Does > rdig? > > This could be a deal breaker for me 'cos I really need proximity > searches -- Posted via http://www.ruby-forum.com/. From ed.temp.01 at gmail.com Thu Feb 15 08:35:14 2007 From: ed.temp.01 at gmail.com (Ed Ed) Date: Thu, 15 Feb 2007 14:35:14 +0100 Subject: [Ferret-talk] rdig wildcard searches Message-ID: <11ba01604188714833e14309392ad9c0@ruby-forum.com> Lucene has simple wildcard syntax supporting ? and * thus ruby could be matched by rub? r*by etc. This doesn't work using rdig on the command line e.g. rdig -c config.rb -q 'data:"ru?y"' gives RDig version 0.3.4 using Ferret 0.10.14 executing query >data:"ru?y"< Query: data:"ru y"~1 which is something entirely different. The ferret docs seem to imply that support for wildcard works as in lucene though I haven't tried it yet -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Thu Feb 15 09:01:35 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Thu, 15 Feb 2007 15:01:35 +0100 (CET) Subject: [Ferret-talk] Proximity searching in rdig ferret In-Reply-To: References: Message-ID: <44210.212.227.62.4.1171548095.squirrel@orkland.homeunix.org> > Lucene has a syntax "foo bar"~10 for finding foo within 10 words of bar. sure .. take a look at this (take from the API): --snip-- query = SpanNearQuery.new(:slop => 2) query << SpanTermQuery.new(:field, "quick") query << SpanTermQuery.new(:field, "brown") query << SpanTermQuery.new(:field, "fox") # matches => "quick brown speckled sleepy fox" |______2______^ # matches => "quick brown speckled fox" |__1__^ # matches => "brown quick _____ fox" ^_____2_____| A SpanNearQuery is like a combination between a PhraseQuery and a BooleanQuery. It matches sub-SpanQueries which are added as clauses but those clauses must occur within a slop edit distance of eachother. http://ferret.davebalmain.com/api/classes/Ferret/Search/Spans/SpanNearQuery.html --snip-- and even better, you can force the order of the words.. the above example doesn't care about the order of the words, but if the order is important to you, you can build queries that will search for terms in order and a maximal distance between the words.. look at http://blog.omdb-beta.org/2007/1/16/brad_pitt Ben From bk at benjaminkrause.com Thu Feb 15 09:02:57 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Thu, 15 Feb 2007 15:02:57 +0100 (CET) Subject: [Ferret-talk] Proximity searching in rdig ferret In-Reply-To: <67adfd7b1f645550977adac3d65c3f22@ruby-forum.com> References: <67adfd7b1f645550977adac3d65c3f22@ruby-forum.com> Message-ID: <44310.212.227.62.4.1171548177.squirrel@orkland.homeunix.org> > Of course it works - stupid boy! it's okay to doubt ferret.. sooner or later you will be convinced ;) happy ferret'ing .. Ben From subbu at coredotcontinuum.com Thu Feb 15 13:53:43 2007 From: subbu at coredotcontinuum.com (Subbu Balakrishnan) Date: Thu, 15 Feb 2007 19:53:43 +0100 Subject: [Ferret-talk] Running the DRb script Message-ID: <3974446cbe9f26edbacbceca8ee0339d@ruby-forum.com> Hi, I seem to have rather silly problem. I'm trying to run the script for the DRb server in the acts_as_ferret trunk for setting up a centralized index server. When I try to run script/runner vendor/plugins/acts_as_ferret/script/ferret_server, I get a ruby error /opt/csw/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/runner.rb:27: undefined local variable or method `vendor' for # (NameError) from /opt/csw/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:21:in `eval' from /opt/csw/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/runner.rb:27 from /opt/csw/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:21:in `require' from /opt/csw/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:147:in `require' from script/runner:3 I'm guessing that eval is not able to run the ferret_server script. Is there something basic that I'm doing wrong here? Cheers -- Posted via http://www.ruby-forum.com/. From rohatgi83 at google.com Thu Feb 15 21:07:05 2007 From: rohatgi83 at google.com (amit) Date: Fri, 16 Feb 2007 03:07:05 +0100 Subject: [Ferret-talk] Highlight raises Segmentation Fault Error in Ferret 0.10.9 i Message-ID: Hi Everyone, I am currently in process of upgrading Ferret 0..9.1 to Ferret 0.10.9 with Ruby 1.8.4 in windows platform. I am trying to use search highlight feature on index_searcher subject = index_searcher.highlight("subject:(blah blah)", 0, :field => :content, :pre_tag = "", :post_tag = "") But all my tests fails and I run into Segmentation Fault Error. Does anyone has come across this bug? I see couple of emails in this forum but I have not been able to find solutions. Can some one please help me and tell me what is the correct implementation ? -Amit -- Posted via http://www.ruby-forum.com/. From amit_rohatgi at yahoo.com Thu Feb 15 21:03:08 2007 From: amit_rohatgi at yahoo.com (Amit) Date: Thu, 15 Feb 2007 18:03:08 -0800 (PST) Subject: [Ferret-talk] Highlight raises Segmentation Fault Error in Ferret 0.10.9 in Windows Message-ID: <403779.47597.qm@web38803.mail.mud.yahoo.com> Hi Everyone, I am currently in process of upgrading Ferret 0..9.1 to Ferret 0.10.9 with Ruby 1.8.4 in windows platform. I am trying to use search highlight feature on index_searcher subject = index_searcher.highlight("subject:(blah blah)", 0, :field => :content, :pre_tag = "", :post_tag = "") But all my tests fails and I run into Segmentation Fault Error. Does anyone has come across this bug? I see couple of emails in this forum but I have not been able to find solutions. Can some one please help me and tell me what is the correct implementation ? -Amit ____________________________________________________________________________________ Now that's room service! Choose from over 150,000 hotels in 45,000 destinations on Yahoo! Travel to find your fit. http://farechase.yahoo.com/promo-generic-14795097 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070215/ed4139e1/attachment.html From dbalmain.ml at gmail.com Thu Feb 15 22:08:34 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 16 Feb 2007 14:08:34 +1100 Subject: [Ferret-talk] Highlight raises Segmentation Fault Error in Ferret 0.10.9 i In-Reply-To: References: Message-ID: On 2/16/07, amit wrote: > Hi Everyone, > > I am currently in process of upgrading Ferret 0..9.1 to Ferret 0.10.9 > with Ruby 1.8.4 in windows platform. I am trying to use search highlight > feature on index_searcher > > subject = index_searcher.highlight("subject:(blah blah)", 0, > :field => :content, > :pre_tag = "", > > :post_tag = "") > > But all my tests fails and I run into Segmentation Fault Error. > Does anyone has come across this bug? I see couple of emails in this > forum but > I have not been able to find solutions. Can some one please help me and > tell me what is the correct implementation ? > > -Amit It looks like you are using IndexSearcher#highlight. The first argument must be a Query object rather than a String. If you want to use a String then you should use Index#highlight. Hope that helps. -- Dave Balmain http://www.davebalmain.com/ From sjoonk at gmail.com Fri Feb 16 01:48:42 2007 From: sjoonk at gmail.com (sjoonk) Date: Fri, 16 Feb 2007 07:48:42 +0100 Subject: [Ferret-talk] Segmentation fault in Search::Searcher#highlight In-Reply-To: <8a17d04d0bf869334b35bd0665e72a10@ruby-forum.com> References: <8a17d04d0bf869334b35bd0665e72a10@ruby-forum.com> Message-ID: sjoonk wrote: > I'm using ferret 0.10.14 in Linux Fedora 3. > When I do highlight with Index::Index#highlight, it works well. > But, doing the same test with Searcher#highlight, > [BUG] Segmentation fault occurred. > > Here's my test code. > > require 'rubygems' > require 'ferret' > include Ferret::Search > > #searcher = Ferret::Index::Index.new(:path => './index') # works > searcher = Searcher.new("./index") # not works! segmentation fault!! > > query = TermQuery.new(:content, ARGV[0]) > > searcher.search_each(query) do |doc_id, score| > puts "Document #{doc_id} found with a score of #{score}" > puts searcher.highlight(query, doc_id, :field => :content) > end > > Do I have some wrong implementation? Help me... I found a solution(a trick). When I put the 4th parameter to Searcher#highlight method, it works! I don't know why. Maybe it's a bug... I think. Anyway, here's my solution. puts searcher.highlight(query, doc_id, :content, {}) Enjoy Ferret! -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Feb 16 04:14:03 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 16 Feb 2007 10:14:03 +0100 Subject: [Ferret-talk] Running the DRb script In-Reply-To: <3974446cbe9f26edbacbceca8ee0339d@ruby-forum.com> References: <3974446cbe9f26edbacbceca8ee0339d@ruby-forum.com> Message-ID: <20070216091403.GV5563@cordoba.webit.de> On Thu, Feb 15, 2007 at 07:53:43PM +0100, Subbu Balakrishnan wrote: > Hi, > > I seem to have rather silly problem. I'm trying to run the script for > the DRb server in the acts_as_ferret trunk for setting up a centralized > index server. When I try to run script/runner > vendor/plugins/acts_as_ferret/script/ferret_server, I get a ruby error > > /opt/csw/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/runner.rb:27: > undefined local variable or method `vendor' for # > (NameError) > from > /opt/csw/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:21:in `eval' > from > /opt/csw/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/runner.rb:27 I'm not sure but it might be that it won't work with 1.1.6. At least I didn't test with this version of Rails. You could try to copy the script to RAILS_ROOT/lib and just do script/runner "require 'ferret_server'" maybe this will work. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From ahfeel_nospam_ at rift.fr Fri Feb 16 04:33:55 2007 From: ahfeel_nospam_ at rift.fr (ahFeel) Date: Fri, 16 Feb 2007 10:33:55 +0100 Subject: [Ferret-talk] Bug in IndexSearcher with limit => all and any offset Message-ID: <610758401e46d038dbbdb3383f67a6e9@ruby-forum.com> Here's the deal: static TopDocs *isea_search_w(Searcher *self, Weight *weight, int first_doc, // OFFSET int num_docs, // LIMIT Filter *filter, Sort *sort, filter_ft filter_func, bool load_fields) { int max_size = first_doc + num_docs; Actually, when you have limit => :all, num_docs equals to INT_MAX, so adding a value to it makes a nice int overflow :/ The diff patch is here: http://pastie.caboo.se/40748 I've told Dave by mail but it seems like he's very busy lately, hope someone else can release some fix here :) -- J?r?mie 'ahFeel' BORDIER Rift Technologies - http://www.rift.fr -- Posted via http://www.ruby-forum.com/. From jason at greenhell.com Fri Feb 16 13:04:45 2007 From: jason at greenhell.com (Jason Hines) Date: Fri, 16 Feb 2007 19:04:45 +0100 Subject: [Ferret-talk] find conditions in more_like_this Message-ID: <5f0b9559b8ec9cdca796316536081256@ruby-forum.com> Hello. I'm trying to use acts_as_ferret to index with set conditions. Ideally I could do something like: acts_as_ferret :fields => [ :title, :body ], :conditions => ["enabled = 1"] But would settle for being able to do: @similiar_blogs = @blog.more_like_this :field_names => [ :title, :body ], :conditions => "enabled=1" What is the best way of accomplishing this with using more_like_this, or even better -- applying these conditions to the model to be indexed globally. Thanks in advance for any advice or suggestions. -- Posted via http://www.ruby-forum.com/. From caleb at inforadical.net Fri Feb 16 17:52:28 2007 From: caleb at inforadical.net (Caleb Clausen) Date: Fri, 16 Feb 2007 14:52:28 -0800 Subject: [Ferret-talk] term vector blues Message-ID: <45D635AC.9090002@inforadical.net> I have a lot of crashes when I try to use term vectors. Here's an example, which crashes pretty consistently. This problem seems to be somewhat sensitive to platform... people on other OS's and ruby versions have reported no error. I have seen this with ferret 0.10.13 and 0.10.14 on debian stable using ruby 1.8.2, but I have observed the same problem on various other systems as well. I've reported this issue here before, but it was when David was gone. program: require 'rubygems' require 'ferret' #require 'zlib' fields=Ferret::Index::FieldInfos.new fields.add_field :text, :store=>:no#, :index=>:omit_norms i = Ferret::I.new :field_infos=>fields #:path=>'temp_index' 20.times{ i << {:text=>`man gcc`[0..135000]} } #i.close_writer r=i.reader #r.term_docs_for(:text, "example") r.term_vector(0,:text) example output: $ ruby tvtest.rb Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... Reformatting gcc(1), please wait... tvtest.rb:16: [BUG] Segmentation fault ruby 1.8.2 (2005-04-11) [i386-linux] Aborted From ferret.5.micboh at spamgourmet.com Mon Feb 19 01:50:42 2007 From: ferret.5.micboh at spamgourmet.com (Joe Mestople) Date: Mon, 19 Feb 2007 07:50:42 +0100 Subject: [Ferret-talk] Ferret seg-faulting during search Message-ID: <1b315155becffc85175d3950a858baaf@ruby-forum.com> Hi, I'm using ferret and running into troubles with it seg faulting during searches. The index I'm searching is static and is only updated in an offline way once every couple weeks. The segfault isn't deterministically reproducible, but if I hammer ferret hard enough I can reliably get it to crash. The problem seems to have something to do with how memory is shared between Ruby and Ferret's C code, for if I disable Ruby garbage collection the crashes go away. I can try to provide a repro, but I think it might be faster to do some of the initial investigation on my own, even though I have no experience with Ferret internals. Can someone give me some pointers on debugging ferret? I assume I need to build a debug version out of SVN, but do I need debug version or ruby as well? Thanks, Joe -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Feb 19 03:52:48 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 19 Feb 2007 09:52:48 +0100 Subject: [Ferret-talk] term vector blues In-Reply-To: <45D635AC.9090002@inforadical.net> References: <45D635AC.9090002@inforadical.net> Message-ID: <20070219085248.GZ5563@cordoba.webit.de> On Fri, Feb 16, 2007 at 02:52:28PM -0800, Caleb Clausen wrote: > I have a lot of crashes when I try to use term vectors. Here's an > example, which crashes pretty consistently. This problem seems to be > somewhat sensitive to platform... people on other OS's and ruby versions > have reported no error. I have seen this with ferret 0.10.13 and 0.10.14 > on debian stable using ruby 1.8.2, but I have observed the same problem > on various other systems as well. I've reported this issue here before, > but it was when David was gone. > > program: > > require 'rubygems' > require 'ferret' > #require 'zlib' > > > fields=Ferret::Index::FieldInfos.new > fields.add_field :text, :store=>:no#, :index=>:omit_norms > i = Ferret::I.new :field_infos=>fields #:path=>'temp_index' > > 20.times{ > i << {:text=>`man gcc`[0..135000]} > } > #i.close_writer > r=i.reader > #r.term_docs_for(:text, "example") > > r.term_vector(0,:text) > > [..] > tvtest.rb:16: [BUG] Segmentation fault > ruby 1.8.2 (2005-04-11) [i386-linux] > > Aborted same here with Ubuntu 6.10 / Ruby 1.8.4. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Mon Feb 19 04:00:41 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 19 Feb 2007 10:00:41 +0100 Subject: [Ferret-talk] find conditions in more_like_this In-Reply-To: <5f0b9559b8ec9cdca796316536081256@ruby-forum.com> References: <5f0b9559b8ec9cdca796316536081256@ruby-forum.com> Message-ID: <20070219090041.GA16225@cordoba.webit.de> On Fri, Feb 16, 2007 at 07:04:45PM +0100, Jason Hines wrote: > > Hello. > > I'm trying to use acts_as_ferret to index with set conditions. > > Ideally I could do something like: > > acts_as_ferret :fields => [ :title, :body ], > :conditions => ["enabled = 1"] > > But would settle for being able to do: > > @similiar_blogs = @blog.more_like_this :field_names => [ :title, :body > ], :conditions => "enabled=1" > > What is the best way of accomplishing this with using more_like_this, or > even better -- applying these conditions to the model to be indexed > globally. you could override the ferret_enabled? instance method to only return true if your condition is met. cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From bk at benjaminkrause.com Mon Feb 19 04:45:32 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Mon, 19 Feb 2007 10:45:32 +0100 Subject: [Ferret-talk] term vector blues In-Reply-To: <20070219085248.GZ5563@cordoba.webit.de> References: <45D635AC.9090002@inforadical.net> <20070219085248.GZ5563@cordoba.webit.de> Message-ID: <4D0F47D7-2025-4B5A-8338-A29F3D03754E@benjaminkrause.com> >> tvtest.rb:16: [BUG] Segmentation fault >> ruby 1.8.2 (2005-04-11) [i386-linux] >> >> Aborted > > same here with Ubuntu 6.10 / Ruby 1.8.4. no problem on MacOSX 10.4, ruby 1.8.5, ferret (0.10.14) From john at johnleach.co.uk Mon Feb 19 06:39:35 2007 From: john at johnleach.co.uk (John Leach) Date: Mon, 19 Feb 2007 11:39:35 +0000 Subject: [Ferret-talk] Ferret seg-faulting during search In-Reply-To: <1b315155becffc85175d3950a858baaf@ruby-forum.com> References: <1b315155becffc85175d3950a858baaf@ruby-forum.com> Message-ID: <1171885175.26677.7.camel@localhost.localdomain> Hi Joe, I've experienced lots of segfaults too. Last I heard from David Balmain (Ferret's author) was that he knew what was causing it and is fixing it. That was last week, but he's been pretty busy with other things lately so I guess he can't predict when a new release will be due. John. -- http://johnleach.co.uk On Mon, 2007-02-19 at 07:50 +0100, Joe Mestople wrote: > Hi, > > I'm using ferret and running into troubles with it seg faulting during > searches. The index I'm searching is static and is only updated in an > offline way once every couple weeks. > > The segfault isn't deterministically reproducible, but if I hammer > ferret hard enough I can reliably get it to crash. The problem seems to > have something to do with how memory is shared between Ruby and Ferret's > C code, for if I disable Ruby garbage collection the crashes go away. > > I can try to provide a repro, but I think it might be faster to do some > of the initial investigation on my own, even though I have no experience > with Ferret internals. Can someone give me some pointers on debugging > ferret? I assume I need to build a debug version out of SVN, but do I > need debug version or ruby as well? > > Thanks, > Joe > From patched at sourfamily.com Mon Feb 19 11:42:05 2007 From: patched at sourfamily.com (Gregg Pollack) Date: Mon, 19 Feb 2007 17:42:05 +0100 Subject: [Ferret-talk] Acts_As_Ferret Tutorial Message-ID: <52f25102252c62a12f9457ce5cf91c00@ruby-forum.com> Hey guys, I wanted to share with you guys a detailed tutorial I just finished this weekend for using Acts_As_Ferret. http://www.railsenvy.com/2007/2/19/acts-as-ferret-tutorial I started using Ferret and Acts_As_Ferret a few weeks ago, and I learned so much that I wanted to give back to the community by writing up a helpful tutorial that covers all the key topics in one place (Something I wish I would have had when I started using it). There are a few tutorials already out there, but none that really lay it all out. Thank you guys for creating a great search tool, please let me know if there is anything I got wrong, or that I should add. -- Gregg Pollack "All of us could take a lesson from the weather. It pays no attention to criticism" -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Feb 19 12:15:32 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 19 Feb 2007 18:15:32 +0100 Subject: [Ferret-talk] Acts_As_Ferret Tutorial In-Reply-To: <52f25102252c62a12f9457ce5cf91c00@ruby-forum.com> References: <52f25102252c62a12f9457ce5cf91c00@ruby-forum.com> Message-ID: <20070219171532.GC16225@cordoba.webit.de> Hi! On Mon, Feb 19, 2007 at 05:42:05PM +0100, Gregg Pollack wrote: > Hey guys, > > I wanted to share with you guys a detailed tutorial I just > finished this weekend for using Acts_As_Ferret. > > http://www.railsenvy.com/2007/2/19/acts-as-ferret-tutorial > > I started using Ferret and Acts_As_Ferret a few weeks ago, and I > learned so much that I wanted to give back to the community by writing > up a helpful tutorial that covers all the key topics in one place > (Something I wish I would have had when I started using it). There are > a few tutorials already out there, but none that really lay it all out. Great work! > Thank you guys for creating a great search tool, please let me > know if there is anything I got wrong, or that I should add. I think you missed the doc_id parameter in your highlighting code sample, but otherwise it's pretty perfect :-) cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From patched at sourfamily.com Mon Feb 19 12:55:06 2007 From: patched at sourfamily.com (Gregg Pollack) Date: Mon, 19 Feb 2007 18:55:06 +0100 Subject: [Ferret-talk] Acts_As_Ferret Tutorial In-Reply-To: <20070219171532.GC16225@cordoba.webit.de> References: <52f25102252c62a12f9457ce5cf91c00@ruby-forum.com> <20070219171532.GC16225@cordoba.webit.de> Message-ID: Thanks Jen, and thanks for the heads up on the missing doc. I've fixed the syntax. -Gregg http://www.railsenvy.com Jens Kraemer wrote: > Hi! > > On Mon, Feb 19, 2007 at 05:42:05PM +0100, Gregg Pollack wrote: >> (Something I wish I would have had when I started using it). There are >> a few tutorials already out there, but none that really lay it all out. > > Great work! > >> Thank you guys for creating a great search tool, please let me >> know if there is anything I got wrong, or that I should add. > > I think you missed the doc_id parameter in your highlighting code > sample, but otherwise it's pretty perfect :-) > > cheers, > Jens > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen Malessa -- Posted via http://www.ruby-forum.com/. From slyris at gmail.com Mon Feb 19 13:44:07 2007 From: slyris at gmail.com (Sonia Lyris) Date: Mon, 19 Feb 2007 10:44:07 -0800 Subject: [Ferret-talk] Searching for terms in free-form text Message-ID: <45D9EFF7.8020504@gmail.com> What is the best way to search a (possibly long) string of free-form text (like, say, an email) for occurrances of some set of key phrases of interest? Fuzzy or not; I'll take what I can get. Thanks in advance! -- Sonia Lyris | slyris at gmail.com From pritchie at videotron.ca Mon Feb 19 16:00:57 2007 From: pritchie at videotron.ca (Patrick Ritchie) Date: Mon, 19 Feb 2007 16:00:57 -0500 Subject: [Ferret-talk] term vector blues In-Reply-To: <4D0F47D7-2025-4B5A-8338-A29F3D03754E@benjaminkrause.com> References: <45D635AC.9090002@inforadical.net> <20070219085248.GZ5563@cordoba.webit.de> <4D0F47D7-2025-4B5A-8338-A29F3D03754E@benjaminkrause.com> Message-ID: <45DA1009.50307@videotron.ca> Benjamin Krause wrote: >>> tvtest.rb:16: [BUG] Segmentation fault >>> ruby 1.8.2 (2005-04-11) [i386-linux] >>> >>> Aborted >>> >> same here with Ubuntu 6.10 / Ruby 1.8.4. >> Same on cygwin / Ruby 1.8.5, BUT if I turn off garbage collection (GC.disable) it doesn't crash. I think this is related to: http://rubyforge.org/pipermail/ferret-talk/2007-February/002504.html and others... which David said he is working on. The following script always seems to die at the same point on my machine and may provide some extra insight. require 'rubygems' require 'ferret' fields = Ferret::Index::FieldInfos.new fields.add_field :text, :store => :no #GC.disable s = `man gcc` ix = 0 s.scan(/./m) do |c| puts "#{ix}: #{c}" i = Ferret::I.new :field_infos => fields i << {:text => s[0..ix+=1]} tv = i.reader.term_vector(0, :text) end Dies on character 357 on my machine... Cheers! Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070219/805ea2b7/attachment.html From kraemer at webit.de Tue Feb 20 08:14:53 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 20 Feb 2007 14:14:53 +0100 Subject: [Ferret-talk] Searching for terms in free-form text In-Reply-To: <45D9EFF7.8020504@gmail.com> References: <45D9EFF7.8020504@gmail.com> Message-ID: <20070220131453.GD16225@cordoba.webit.de> Hi! On Mon, Feb 19, 2007 at 10:44:07AM -0800, Sonia Lyris wrote: > What is the best way to search a (possibly long) string of free-form > text (like, say, an email) for occurrances of some set of key phrases > of interest? Fuzzy or not; I'll take what I can get. could you be a little bit more specific in what you want to do? For finding out if a phrase occurs in a single piece of text a regex could be sufficient, but I doubt that's what you're asking for :-) of course you can do this with ferret, too: require 'rubygems' require 'ferret' index = Ferret::I.new index << 'your text here' top_docs = index.search 'text' puts 'found text' if top_docs.total_hits > 0 Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From john at mightytofu.com Tue Feb 20 09:21:21 2007 From: john at mightytofu.com (John) Date: Tue, 20 Feb 2007 15:21:21 +0100 Subject: [Ferret-talk] ferret webpage down Message-ID: <3331eb2576f4bff4c8deb591eddc54ff@ruby-forum.com> The ferret webpage at http://ferret.davebalmain.com/ has been down for a number of days. Any idea what's going on? or how to notify the webmaster? -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Tue Feb 20 09:55:04 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Tue, 20 Feb 2007 15:55:04 +0100 (CET) Subject: [Ferret-talk] ferret webpage down In-Reply-To: <3331eb2576f4bff4c8deb591eddc54ff@ruby-forum.com> References: <3331eb2576f4bff4c8deb591eddc54ff@ruby-forum.com> Message-ID: <41363.213.61.189.202.1171983304.squirrel@orkland.homeunix.org> > The ferret webpage at http://ferret.davebalmain.com/ has been down for a > number of days. Any idea what's going on? or how to notify the > webmaster? maybe dave is working on that page? i guess he's aware of the problem, and as he said, he's planning to release 1.0 in the near future.. so it wouldn't surprise me if he's currently working on that page.. anyway, he is reading this list, so he's notified by now :-) Ben From caleb at inforadical.net Tue Feb 20 23:13:35 2007 From: caleb at inforadical.net (Caleb Clausen) Date: Tue, 20 Feb 2007 20:13:35 -0800 Subject: [Ferret-talk] term vector blues Message-ID: <45DBC6EF.5050700@inforadical.net> Some more on this issue. I can narrow the crash down to an index with just one document, and ferret crashes after getting its term vector a number of times, perhaps as few as 2 or 3. The version below is tuned to crash quickly on my system, others may find it necessary to give other numbers in the command line argument in order to make the crash happen sooner. See below for the code. Some fiddling with the ferret source code reveals that disabling the call to tv_destroy() in frt_ir_term_vector() (which is the implmentation of #term_vector) seems to make the problem go away. Experimenting with tv_destroy(), I found that disabling just the frees of offsets and positions is enough to keep the crash away. This suggests that there's a mismanagement of the allocation of the associated variables, but if so I was unable to spot it in the source.... This is further than I've gotten in investigating this problem in a while, but I'm unsure where to go next. require 'rubygems' require 'ferret' fields = Ferret::Index::FieldInfos.new fields.add_field :text, :store => :no scale=(ARGV.first||662).to_i #rand(1000) s = {:text => "foo bar baz "*scale } i = Ferret::I.new :field_infos => fields i << s 9999999999.times do|j| tv = i.reader.term_vector(0, :text) print "."; STDOUT.flush end From john at mightytofu.com Wed Feb 21 07:01:17 2007 From: john at mightytofu.com (John) Date: Wed, 21 Feb 2007 13:01:17 +0100 Subject: [Ferret-talk] ferret webpage down In-Reply-To: <41363.213.61.189.202.1171983304.squirrel@orkland.homeunix.org> References: <3331eb2576f4bff4c8deb591eddc54ff@ruby-forum.com> <41363.213.61.189.202.1171983304.squirrel@orkland.homeunix.org> Message-ID: <026f3085bd31df2b74a17fca53e8e553@ruby-forum.com> Looks like it's still down. I need some information from that site. I'm not sure if Dave is notified though. Benjamin Krause wrote: >> The ferret webpage at http://ferret.davebalmain.com/ has been down for a >> number of days. Any idea what's going on? or how to notify the >> webmaster? > > maybe dave is working on that page? i guess he's aware of the problem, > and > as he said, he's planning to release 1.0 in the near future.. so it > wouldn't surprise me if he's currently working on that page.. > > anyway, he is reading this list, so he's notified by now :-) > > Ben -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Wed Feb 21 07:11:01 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Wed, 21 Feb 2007 13:11:01 +0100 Subject: [Ferret-talk] ferret webpage down In-Reply-To: <026f3085bd31df2b74a17fca53e8e553@ruby-forum.com> References: <3331eb2576f4bff4c8deb591eddc54ff@ruby-forum.com> <41363.213.61.189.202.1171983304.squirrel@orkland.homeunix.org> <026f3085bd31df2b74a17fca53e8e553@ruby-forum.com> Message-ID: <812D5D25-D81F-4A15-BF40-E99336ED6ACC@benjaminkrause.com> On 2007-02-21, at 13:01, John wrote: > Looks like it's still down. I need some information from that site. > I'm > not sure if Dave is notified though. what do you need? Maybe someone on the list can help you .. you can still access the API via http://ferret.davebalmain.com/api/ Ben From jan.prill at gmail.com Wed Feb 21 12:03:08 2007 From: jan.prill at gmail.com (Jan Prill) Date: Wed, 21 Feb 2007 17:03:08 +0000 Subject: [Ferret-talk] ferret webpage down In-Reply-To: <812D5D25-D81F-4A15-BF40-E99336ED6ACC@benjaminkrause.com> References: <3331eb2576f4bff4c8deb591eddc54ff@ruby-forum.com> <41363.213.61.189.202.1171983304.squirrel@orkland.homeunix.org> <026f3085bd31df2b74a17fca53e8e553@ruby-forum.com> <812D5D25-D81F-4A15-BF40-E99336ED6ACC@benjaminkrause.com> Message-ID: <562a35c10702210903qacdd6d2l70338f096c62ecdf@mail.gmail.com> Or maybe you are in luck and the waybackmachine or google-cache has what you need: http://web.archive.org/web/20060515035404/ferret.davebalmain.com/trac Cheers, Jan On 2/21/07, Benjamin Krause wrote: > > > On 2007-02-21, at 13:01, John wrote: > > > Looks like it's still down. I need some information from that site. > > I'm > > not sure if Dave is notified though. > > what do you need? Maybe someone on the list can help you .. > > you can still access the API via http://ferret.davebalmain.com/api/ > > Ben > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Rechtsanwalt Gr?nebergstra?e 38 22763 Hamburg Tel +49 (0)40 41265809 Fax +49 (0)40 380178-73022 Mobil +49 (0)171 3516667 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070221/7ee5b554/attachment.html From john at mightytofu.com Wed Feb 21 18:17:04 2007 From: john at mightytofu.com (John) Date: Thu, 22 Feb 2007 00:17:04 +0100 Subject: [Ferret-talk] ferret webpage down In-Reply-To: <562a35c10702210903qacdd6d2l70338f096c62ecdf@mail.gmail.com> References: <3331eb2576f4bff4c8deb591eddc54ff@ruby-forum.com> <41363.213.61.189.202.1171983304.squirrel@orkland.homeunix.org> <026f3085bd31df2b74a17fca53e8e553@ruby-forum.com> <812D5D25-D81F-4A15-BF40-E99336ED6ACC@benjaminkrause.com> <562a35c10702210903qacdd6d2l70338f096c62ecdf@mail.gmail.com> Message-ID: Thanks for the cache links. I wanted to find out if there is information related to unicode support in Ferret. Anyone here have any info regarding that? Jan Prill wrote: > Or maybe you are in luck and the waybackmachine or google-cache has what > you > need: > http://web.archive.org/web/20060515035404/ferret.davebalmain.com/trac > > Cheers, > Jan -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Wed Feb 21 23:58:44 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 22 Feb 2007 15:58:44 +1100 Subject: [Ferret-talk] ferret webpage down In-Reply-To: References: <3331eb2576f4bff4c8deb591eddc54ff@ruby-forum.com> <41363.213.61.189.202.1171983304.squirrel@orkland.homeunix.org> <026f3085bd31df2b74a17fca53e8e553@ruby-forum.com> <812D5D25-D81F-4A15-BF40-E99336ED6ACC@benjaminkrause.com> <562a35c10702210903qacdd6d2l70338f096c62ecdf@mail.gmail.com> Message-ID: On 2/22/07, John wrote: > Thanks for the cache links. > > I wanted to find out if there is information related to unicode support > in Ferret. Anyone here have any info regarding that? The quick answer is that Ferret generally treats strings as an arrays of bytes so it can handle whatever strings the Analyzer gives it. Analyzers are pretty easy to implement. You can search the mailing list or look at the unit tests packaged with Ferret for examples. The default Analyzer will handle strings according to your locale settings so if your locale is set to utf-8 then the analyzer should parse utf-8 strings correctly. As for the website being down, Ben was half right. I'm not actually working on it yet but I plan to migrate to a combination of Colaboa and Ruse very soon. I was hoping to get to it a little sooner but it looks like I'll need to get trac up and running again for a little while longer. Cheers, Dave From dbalmain.ml at gmail.com Thu Feb 22 00:05:05 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 22 Feb 2007 16:05:05 +1100 Subject: [Ferret-talk] Ferret progress update Message-ID: Hi folks, Just thought I better let you all know that I'm still working on the next release of Ferret. I've been working the last 7 days doing nothing but Ferret development. The last iteration generated a diff of almost 5000 lines so there are some pretty major changes. Most people won't notice these changes however as the API remains unchanged. But if you were having problems with FileNotFound errors or other types of segmentation faults the next version should fix most of them. I'm now going to go through the mailing list and the Trac bug reports to fix any other small problems laying around before I release the next version. Coming soon... -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Thu Feb 22 02:20:52 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 22 Feb 2007 18:20:52 +1100 Subject: [Ferret-talk] Bug in IndexSearcher with limit => all and any offset In-Reply-To: <610758401e46d038dbbdb3383f67a6e9@ruby-forum.com> References: <610758401e46d038dbbdb3383f67a6e9@ruby-forum.com> Message-ID: On 2/16/07, ahFeel wrote: > Here's the deal: > > static TopDocs *isea_search_w(Searcher *self, > Weight *weight, > int first_doc, // OFFSET > int num_docs, // LIMIT > Filter *filter, > Sort *sort, > filter_ft filter_func, > bool load_fields) > { > int max_size = first_doc + num_docs; > > Actually, when you have limit => :all, num_docs equals to INT_MAX, so > adding a value to it makes a nice int overflow :/ > > The diff patch is here: > http://pastie.caboo.se/40748 > > I've told Dave by mail but it seems like he's very busy lately, hope > someone else can release some fix here :) Thanks J?r?mie, this bug has been fixed. -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Thu Feb 22 02:33:57 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 22 Feb 2007 18:33:57 +1100 Subject: [Ferret-talk] Ferret seg-faulting during search In-Reply-To: <1171885175.26677.7.camel@localhost.localdomain> References: <1b315155becffc85175d3950a858baaf@ruby-forum.com> <1171885175.26677.7.camel@localhost.localdomain> Message-ID: On 2/19/07, John Leach wrote: > Hi Joe, > > I've experienced lots of segfaults too. Last I heard from David Balmain > (Ferret's author) was that he knew what was causing it and is fixing it. > That was last week, but he's been pretty busy with other things lately > so I guess he can't predict when a new release will be due. Hi guys, The segfault problem I've fixed is the one that occurs when you have multiple processes accessing the index. The segfault problem that Joe is getting sounds like it might be something else. If you can send me a reproducible test case that would be brilliant. I'm going to try and get a release out tomorrow, but if I can reproduce Joe's problem I'll try and fix it before I put out the release. -- Dave Balmain http://www.davebalmain.com/ From john at mightytofu.com Thu Feb 22 03:25:03 2007 From: john at mightytofu.com (John) Date: Thu, 22 Feb 2007 09:25:03 +0100 Subject: [Ferret-talk] ferret webpage down In-Reply-To: References: <3331eb2576f4bff4c8deb591eddc54ff@ruby-forum.com> <41363.213.61.189.202.1171983304.squirrel@orkland.homeunix.org> <026f3085bd31df2b74a17fca53e8e553@ruby-forum.com> <812D5D25-D81F-4A15-BF40-E99336ED6ACC@benjaminkrause.com> <562a35c10702210903qacdd6d2l70338f096c62ecdf@mail.gmail.com> Message-ID: <4b1c30ea8849e00fb8f989acede02b5f@ruby-forum.com> Thanks, Dave. Just tried it out, and it works beautifully out of the box. Looking forward to the new version! David Balmain wrote: > > The quick answer is that Ferret generally treats strings as an arrays > of bytes so it can handle whatever strings the Analyzer gives it. > Analyzers are pretty easy to implement. You can search the mailing > list or look at the unit tests packaged with Ferret for examples. The > default Analyzer will handle strings according to your locale settings > so if your locale is set to utf-8 then the analyzer should parse utf-8 > strings correctly. > > As for the website being down, Ben was half right. I'm not actually > working on it yet but I plan to migrate to a combination of Colaboa > and Ruse very soon. I was hoping to get to it a little sooner but it > looks like I'll need to get trac up and running again for a little > while longer. > > Cheers, > Dave -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Feb 22 04:17:07 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 22 Feb 2007 10:17:07 +0100 Subject: [Ferret-talk] Ferret progress update In-Reply-To: References: Message-ID: <20070222091707.GK16225@cordoba.webit.de> On Thu, Feb 22, 2007 at 04:05:05PM +1100, David Balmain wrote: > Hi folks, > > Just thought I better let you all know that I'm still working on the > next release of Ferret. I've been working the last 7 days doing > nothing but Ferret development. The last iteration generated a diff of > almost 5000 lines so there are some pretty major changes. Most people > won't notice these changes however as the API remains unchanged. But > if you were having problems with FileNotFound errors or other types of > segmentation faults the next version should fix most of them. You rock :-) cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From john at johnleach.co.uk Thu Feb 22 06:08:54 2007 From: john at johnleach.co.uk (John Leach) Date: Thu, 22 Feb 2007 11:08:54 +0000 Subject: [Ferret-talk] Ferret progress update In-Reply-To: References: Message-ID: <1172142534.8608.2.camel@localhost.localdomain> Thanks Dave! Looking forward to it. Can you tell us a bit more about what led to the segfault error cropping up? Have they been in the 0.10 branch all along? 0.9 too? Or did some new work break something? Maybe it will help others debug problems in future. John. -- http://johnleach.co.uk On Thu, 2007-02-22 at 16:05 +1100, David Balmain wrote: > Hi folks, > > Just thought I better let you all know that I'm still working on the > next release of Ferret. I've been working the last 7 days doing > nothing but Ferret development. The last iteration generated a diff of > almost 5000 lines so there are some pretty major changes. Most people > won't notice these changes however as the API remains unchanged. But > if you were having problems with FileNotFound errors or other types of > segmentation faults the next version should fix most of them. > > I'm now going to go through the mailing list and the Trac bug reports > to fix any other small problems laying around before I release the > next version. Coming soon... > From henke at mac.se Thu Feb 22 11:24:55 2007 From: henke at mac.se (Henrik Zagerholm) Date: Thu, 22 Feb 2007 17:24:55 +0100 Subject: [Ferret-talk] Combine ferret with database Message-ID: <40039EC3-7831-4698-B8E9-1B1E9A216ECA@mac.se> Hello list, I wonder if someone has some tips on joining a ferret search with a database. I have a rails project using a postgresql backend and I would like to utilize the superb performance of ferret for fulltext searching. The problem is that I have to joined the result with the database as I have some user access rights to different documents to take into account. Does anyone have any ideas how to do this easily? MAybe a cool stored procedure in PGSQL that can utilize the ferret index :D Cheers, henrik From ryansking at gmail.com Thu Feb 22 13:38:59 2007 From: ryansking at gmail.com (Ryan King) Date: Thu, 22 Feb 2007 10:38:59 -0800 Subject: [Ferret-talk] Combine ferret with database In-Reply-To: <40039EC3-7831-4698-B8E9-1B1E9A216ECA@mac.se> References: <40039EC3-7831-4698-B8E9-1B1E9A216ECA@mac.se> Message-ID: <846f30c70702221038q1d122b87i3277308c6854c8fb@mail.gmail.com> On 2/22/07, Henrik Zagerholm wrote: > > Hello list, > > I wonder if someone has some tips on joining a ferret search with a > database. > > I have a rails project using a postgresql backend and I would like to > utilize the superb performance of ferret for fulltext searching. > The problem is that I have to joined the result with the database as > I have some user access rights to different documents to take into > account. > > Does anyone have any ideas how to do this easily? > > MAybe a cool stored procedure in PGSQL that can utilize the ferret > index :D Use acts_as_ferret [1], then either put the access rights stuff in the index, or post process the results from AAF. -ryan 1. http://projects.jkraemer.net/acts_as_ferret/wiki -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070222/5d2d58e0/attachment.html From dbalmain.ml at gmail.com Thu Feb 22 19:57:42 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 23 Feb 2007 11:57:42 +1100 Subject: [Ferret-talk] term vector blues In-Reply-To: <45DBC6EF.5050700@inforadical.net> References: <45DBC6EF.5050700@inforadical.net> Message-ID: On 2/21/07, Caleb Clausen wrote: > Some more on this issue. I can narrow the crash down to an index with > just one document, and ferret crashes after getting its term vector a > number of times, perhaps as few as 2 or 3. The version below is tuned to > crash quickly on my system, others may find it necessary to give other > numbers in the command line argument in order to make the crash happen > sooner. See below for the code. > > Some fiddling with the ferret source code reveals that disabling the > call to tv_destroy() in frt_ir_term_vector() (which is the implmentation > of #term_vector) seems to make the problem go away. Experimenting with > tv_destroy(), I found that disabling just the frees of offsets and > positions is enough to keep the crash away. This suggests that there's a > mismanagement of the allocation of the associated variables, but if so I > was unable to spot it in the source.... > > This is further than I've gotten in investigating this problem in a > while, but I'm unsure where to go next. Hi Caleb, After reading your first email I found the same things behavior as you describe here. This is very frustrating because in this case I create completely independent Ruby objects. They don't reference the Ferret data space at all so this was the last place I expected to have garbage collection problems. It makes no sense to me at all that not freeing the offsets and positions arrays should make any difference at all. If you have any more ideas with regard to this problem I'd love to hear them as it has me a little stumped. -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Thu Feb 22 21:11:06 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 23 Feb 2007 13:11:06 +1100 Subject: [Ferret-talk] term vector blues In-Reply-To: References: <45DBC6EF.5050700@inforadical.net> Message-ID: On 2/23/07, David Balmain wrote: > On 2/21/07, Caleb Clausen wrote: > > Some more on this issue. I can narrow the crash down to an index with > > just one document, and ferret crashes after getting its term vector a > > number of times, perhaps as few as 2 or 3. The version below is tuned to > > crash quickly on my system, others may find it necessary to give other > > numbers in the command line argument in order to make the crash happen > > sooner. See below for the code. > > > > Some fiddling with the ferret source code reveals that disabling the > > call to tv_destroy() in frt_ir_term_vector() (which is the implmentation > > of #term_vector) seems to make the problem go away. Experimenting with > > tv_destroy(), I found that disabling just the frees of offsets and > > positions is enough to keep the crash away. This suggests that there's a > > mismanagement of the allocation of the associated variables, but if so I > > was unable to spot it in the source.... > > > > This is further than I've gotten in investigating this problem in a > > while, but I'm unsure where to go next. > > Hi Caleb, > > After reading your first email I found the same things behavior as you > describe here. This is very frustrating because in this case I create > completely independent Ruby objects. They don't reference the Ferret > data space at all so this was the last place I expected to have > garbage collection problems. It makes no sense to me at all that not > freeing the offsets and positions arrays should make any difference at > all. If you have any more ideas with regard to this problem I'd love > to hear them as it has me a little stumped. I've made a little more progress on this. By disabling the garbage collector while building the term_vector I can prevent the segfault. I guess I need to spend some time to really understand how the ruby garbage collector works. Adding the top and bottom lines bellow prevents any segfault; int old_dont_gc = rb_gc_disable(); rtv = frt_get_tv(tv); tv_destroy(tv); if (old_dont_gc == Qfalse) rb_gc_enable(); -- Dave Balmain http://www.davebalmain.com/ From nappin713 at yahoo.com Thu Feb 22 22:16:46 2007 From: nappin713 at yahoo.com (Raymond O'connor) Date: Fri, 23 Feb 2007 04:16:46 +0100 Subject: [Ferret-talk] IOError on clearing locks In-Reply-To: <45145B26.3030800@blackkettle.org> References: <4513D654.4080909@blackkettle.org> <45145B26.3030800@blackkettle.org> Message-ID: Has this been resolved? I'm still getting this problem on the latest Ferret. Is there any way around this problem? Thanks Ray Alex Young wrote: > David Balmain wrote: >>> Error occured in fs_store.c:146 - fs_clear_locks >>> >> >> Hi Alex, >> >> This is a bug which I'm fixing right now. If you open any >> FSDirectories then you must close them too before you rm_f the index >> dir. Unfortunately FSDirectory#close doesn't currently work and the >> Index class doesn't call it either so try 0.10.7 when I release it. > > Ah, ok. It's not actually affecting live code, it just makes a mess of > my tests as it stands. > > Keep up the good work :-) -- Posted via http://www.ruby-forum.com/. From Neville.Burnell at bmsoft.com.au Fri Feb 23 00:49:30 2007 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Fri, 23 Feb 2007 16:49:30 +1100 Subject: [Ferret-talk] bug with boolean query evaluation containing parenthesis and NOT ? Message-ID: <126EC586577FD611A28E00A0C9A03758B5C505@maui.bmsoft.com.au> Hi, The following [simplified] query works well, however a variation which includes parenthesis seems to fail, in that it returns hits which should be excluded by the NOT term. This is surprising because in this simple case, the parenthesis shouldn't change the Boolean evaluation ... any pointers? Working Query: field1:value1 AND NOT field2:value2 Failing Query: field1:value1 AND ( NOT field2:value2 ) Kind Regards Neville From dbalmain.ml at gmail.com Fri Feb 23 01:11:08 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 23 Feb 2007 17:11:08 +1100 Subject: [Ferret-talk] Ferret progress update In-Reply-To: <1172142534.8608.2.camel@localhost.localdomain> References: <1172142534.8608.2.camel@localhost.localdomain> Message-ID: On 2/22/07, John Leach wrote: > Thanks Dave! Looking forward to it. > > Can you tell us a bit more about what led to the segfault error cropping > up? Have they been in the 0.10 branch all along? 0.9 too? Or did some > new work break something? > > Maybe it will help others debug problems in future. Well, the main problem I fixed was due to an error introduced in 0.10. I wasn't locking the commit log in all the places I should have. This actually would have been very easy to fix if someone had supplied a repeatable test case. In the end though I decided to lock-less commits, a new feature that has recently been added to Lucene. The main advantages of this are that you can open IndexReaders when an IndexWriter is committing and you can open multiple IndexReaders at a time without them interrupting each other. It also makes it much easier to recover after a crash. If your system crashes in the middle of a commit then Ferret will be able to open the previously committed version of the index. As for the segfaults, I think I finally found the problem today. To improve the performance of Ferret's bindings I was adding objects to Ruby's Array directly instead of using the rb_ary_push method. Some of these arrays are quite large so using rb_ary_push was a lot of overhead which I didn't think was really necessary ... but I didn't quite get it right. For example, I had; rterms = rb_ary_new2(term_cnt); rts = RARRAY(rterms)->ptr; RARRAY(rterms)->len = term_cnt; for (i = 0; i < term_cnt; i++) { rts[i] = frt_get_tv_term(&terms[i]); } So, in this example, the number of terms in a field can be very large and we save a lot of time[1] by setting the C array directly rather than use rb_ary_push. The problem occurs when the garbage collector gets called in the middle of filling the array. It will try and mark all of the objects contained by the array but the array isn't filled yet so many of its elements haven't been set yet. What I should have done was incremented the array length as I went. rterms = rb_ary_new2(term_cnt); rts = RARRAY(rterms)->ptr; for (i = 0; i < term_cnt; i++) { rts[i] = frt_get_tv_term(&terms[i]); RARRAY(rterms)->len++; } This is touch slower than the original code but it now works so that's all that matters. You may be thinking I could have just set the length after the loop. rterms = rb_ary_new2(term_cnt); rts = RARRAY(rterms)->ptr; for (i = 0; i < term_cnt; i++) { rts[i] = frt_get_tv_term(&terms[i]); } RARRAY(rterms)->len = term_cnt; But the problem here is that the elements that have been added to the array won't actually get marked by the garbage collector because the array's length is still 0 so the could incorrectly be collected, thus also causing a segfault. One alternate method that will work would be to user rb_mem_clear(): rterms = rb_ary_new2(term_cnt); rb_mem_clear(rterms, term_cnt); // initialize all elements to nil rts = RARRAY(rterms)->ptr; RARRAY(rterms)->len = term_cnt; for (i = 0; i < term_cnt; i++) { rts[i] = frt_get_tv_term(&terms[i]); } This makes sure all elements are set to nil before the are set to the term vector so they are therefor safe from the garbage collector. Anyway, sorry for the long and boring post. I guess the point is to think about how the garbage collector works when developing ruby bindings. Cheers, Dave [1] How much faster? About 20% faster according to a simple benchmark I just ran. Was it worth the segfaults? Of course not but in a library like this you take the optimizations where you can get them. -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Fri Feb 23 01:22:45 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 23 Feb 2007 17:22:45 +1100 Subject: [Ferret-talk] IOError on clearing locks In-Reply-To: References: <4513D654.4080909@blackkettle.org> <45145B26.3030800@blackkettle.org> Message-ID: On 2/23/07, Raymond O'connor wrote: > Has this been resolved? I'm still getting this problem on the latest > Ferret. Is there any way around this problem? > > Thanks Ray Hi Ray, This problem (below) has been fixed in the current working copy of Ferret. I would still recommend explicitly closing your index and creating a new index. Deleting the directory shouldn't be necessary but if will be safe to do it in the next release. Cheers, Dave > > Alex Young wrote: > > David Balmain wrote: > >>> Error occured in fs_store.c:146 - fs_clear_locks > >>> > >> > >> Hi Alex, > >> > >> This is a bug which I'm fixing right now. If you open any > >> FSDirectories then you must close them too before you rm_f the index > >> dir. Unfortunately FSDirectory#close doesn't currently work and the > >> Index class doesn't call it either so try 0.10.7 when I release it. > > > > Ah, ok. It's not actually affecting live code, it just makes a mess of > > my tests as it stands. > > > > Keep up the good work :-) > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Dave Balmain http://www.davebalmain.com/ From kraemer at webit.de Fri Feb 23 05:06:50 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 23 Feb 2007 11:06:50 +0100 Subject: [Ferret-talk] Combine ferret with database In-Reply-To: <846f30c70702221038q1d122b87i3277308c6854c8fb@mail.gmail.com> References: <40039EC3-7831-4698-B8E9-1B1E9A216ECA@mac.se> <846f30c70702221038q1d122b87i3277308c6854c8fb@mail.gmail.com> Message-ID: <20070223100650.GG19149@cordoba.webit.de> Hi! On Thu, Feb 22, 2007 at 10:38:59AM -0800, Ryan King wrote: > On 2/22/07, Henrik Zagerholm wrote: > > > >Hello list, > > > >I wonder if someone has some tips on joining a ferret search with a > >database. [..] > > > Use acts_as_ferret [1], then either put the access rights stuff in the > index, or post process the results from AAF. > > -ryan > 1. http://projects.jkraemer.net/acts_as_ferret/wiki to expand this a bit, with aaf you can combine a ferret query with the usual active record conditions, joins and includes like that: Model.find_by_contents(query, {}, { :conditions => ["... ], :include =>... }) or just use find_id_by_contents to only retrieve id, model class and score for each hit, and build your own AR query from that. cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From john at johnleach.co.uk Fri Feb 23 06:28:02 2007 From: john at johnleach.co.uk (John Leach) Date: Fri, 23 Feb 2007 11:28:02 +0000 Subject: [Ferret-talk] Ferret progress update In-Reply-To: References: <1172142534.8608.2.camel@localhost.localdomain> Message-ID: <1172230082.10658.18.camel@localhost.localdomain> Hi Dave, interesting stuff. Apparently you can tell the GC not to mess with your stuff using rb_gc_register_address (and rb_gc_unregister_address when/if you're done). Looking at gc.c, all it does is add the pointer to the GC's list of things that are being used, so it won't free it. an example from the Ruby source (showing it being used before object creation): ext/iconv/iconv.c rb_gc_register_address(&charset_map); charset_map = rb_hash_new(); rb_define_singleton_method(rb_cIconv, "charset_map", charset_map_get, 0); I guess you can register before filling the array, set the length, then unregister. Not sure if this actually locks all the values in the array though :/ If not, perhaps you could overwrite the mark function for the array and restore it afterwards, heh. Perhaps not worth the fiddling. I'm no Ruby extension expert though, so beware :) John. -- http://johnleach.co.uk On Fri, 2007-02-23 at 17:11 +1100, David Balmain wrote: > On 2/22/07, John Leach wrote: > > Thanks Dave! Looking forward to it. > > > > Can you tell us a bit more about what led to the segfault error cropping > > up? Have they been in the 0.10 branch all along? 0.9 too? Or did some > > new work break something? > > > > Maybe it will help others debug problems in future. > > Well, the main problem I fixed was due to an error introduced in 0.10. > I wasn't locking the commit log in all the places I should have. This > actually would have been very easy to fix if someone had supplied a > repeatable test case. In the end though I decided to lock-less > commits, a new feature that has recently been added to Lucene. The > main advantages of this are that you can open IndexReaders when an > IndexWriter is committing and you can open multiple IndexReaders at a > time without them interrupting each other. It also makes it much > easier to recover after a crash. If your system crashes in the middle > of a commit then Ferret will be able to open the previously committed > version of the index. From bk at benjaminkrause.com Fri Feb 23 07:08:21 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Fri, 23 Feb 2007 13:08:21 +0100 Subject: [Ferret-talk] ferret webpage down In-Reply-To: References: <3331eb2576f4bff4c8deb591eddc54ff@ruby-forum.com> <41363.213.61.189.202.1171983304.squirrel@orkland.homeunix.org> <026f3085bd31df2b74a17fca53e8e553@ruby-forum.com> <812D5D25-D81F-4A15-BF40-E99336ED6ACC@benjaminkrause.com> <562a35c10702210903qacdd6d2l70338f096c62ecdf@mail.gmail.com> Message-ID: <49AF61F7-E4AB-42AA-8E21-0947EB9EC0D6@benjaminkrause.com> > working on it yet but I plan to migrate to a combination of Colaboa > and Ruse very soon. I was hoping to get to it a little sooner but it can you post any urls about colaboa/ruse ? never heard of that. Ben From jan.prill at gmail.com Fri Feb 23 07:36:42 2007 From: jan.prill at gmail.com (Jan Prill) Date: Fri, 23 Feb 2007 12:36:42 +0000 Subject: [Ferret-talk] ferret webpage down In-Reply-To: <49AF61F7-E4AB-42AA-8E21-0947EB9EC0D6@benjaminkrause.com> References: <3331eb2576f4bff4c8deb591eddc54ff@ruby-forum.com> <41363.213.61.189.202.1171983304.squirrel@orkland.homeunix.org> <026f3085bd31df2b74a17fca53e8e553@ruby-forum.com> <812D5D25-D81F-4A15-BF40-E99336ED6ACC@benjaminkrause.com> <562a35c10702210903qacdd6d2l70338f096c62ecdf@mail.gmail.com> <49AF61F7-E4AB-42AA-8E21-0947EB9EC0D6@benjaminkrause.com> Message-ID: <562a35c10702230436x6845869en7344326a6f801950@mail.gmail.com> On 2/23/07, Benjamin Krause wrote: > > > > working on it yet but I plan to migrate to a combination of Colaboa > > and Ruse very soon. I was hoping to get to it a little sooner but it > > can you post any urls about colaboa/ruse ? never heard of that. > > Ben > http://rubyforge.org/projects/ruse/ http://collaboa.org/ Cheers, Jan Prill -- http://www.inviado.de - Internetseiten f?r RAe http://www.xing.com/profile/Jan_Prill -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070223/b6422e25/attachment.html From kraemer at webit.de Fri Feb 23 07:52:59 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 23 Feb 2007 13:52:59 +0100 Subject: [Ferret-talk] ferret webpage down In-Reply-To: <49AF61F7-E4AB-42AA-8E21-0947EB9EC0D6@benjaminkrause.com> References: <3331eb2576f4bff4c8deb591eddc54ff@ruby-forum.com> <41363.213.61.189.202.1171983304.squirrel@orkland.homeunix.org> <026f3085bd31df2b74a17fca53e8e553@ruby-forum.com> <812D5D25-D81F-4A15-BF40-E99336ED6ACC@benjaminkrause.com> <562a35c10702210903qacdd6d2l70338f096c62ecdf@mail.gmail.com> <49AF61F7-E4AB-42AA-8E21-0947EB9EC0D6@benjaminkrause.com> Message-ID: <20070223125259.GJ19149@cordoba.webit.de> On Fri, Feb 23, 2007 at 01:08:21PM +0100, Benjamin Krause wrote: > > > working on it yet but I plan to migrate to a combination of Colaboa > > and Ruse very soon. I was hoping to get to it a little sooner but it > > can you post any urls about colaboa/ruse ? never heard of that. Collaboa is a rails-based Trac clone with some additional features like multi-project capabilities: http://collaboa.org/ Ruse is a wiki engine: http://wikis.onestepback.org/Ruse/page/show/InstallationGuide http://onestepback.org/index.cgi/Tech/Ruse Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From dbalmain.ml at gmail.com Fri Feb 23 08:51:01 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 24 Feb 2007 00:51:01 +1100 Subject: [Ferret-talk] bug with boolean query evaluation containing parenthesis and NOT ? In-Reply-To: <126EC586577FD611A28E00A0C9A03758B5C505@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A03758B5C505@maui.bmsoft.com.au> Message-ID: On 2/23/07, Neville Burnell wrote: > Hi, > > The following [simplified] query works well, however a variation which > includes parenthesis seems to fail, in that it returns hits which should > be excluded by the NOT term. > > This is surprising because in this simple case, the parenthesis > shouldn't change the Boolean evaluation ... any pointers? > > Working Query: field1:value1 AND NOT field2:value2 > Failing Query: field1:value1 AND ( NOT field2:value2 ) This is a carry over from Lucene. Currently queries must have at least one positive clause. So a search for; NOT field2:value2 will return nothing. So ANDing this clause with another clause will also return nothing. NOT clauses are more like filters than real boolean clauses I guess. You are correct to say this is surprising and it is something I should probably fix but it isn't urgent. I'll put it on my TODO list. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Fri Feb 23 10:04:17 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 24 Feb 2007 02:04:17 +1100 Subject: [Ferret-talk] bug with boolean query evaluation containing parenthesis and NOT ? In-Reply-To: <126EC586577FD611A28E00A0C9A03758B5C505@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A03758B5C505@maui.bmsoft.com.au> Message-ID: On 2/23/07, Neville Burnell wrote: > Working Query: field1:value1 AND NOT field2:value2 > Failing Query: field1:value1 AND ( NOT field2:value2 ) Ok, I decided to fix this after all. Look out for the next release. I still have a few more bugs to fix but it should be out some time over the weekend. -- Dave Balmain http://www.davebalmain.com/ From henke at mac.se Fri Feb 23 17:09:06 2007 From: henke at mac.se (Henrik Zagerholm) Date: Fri, 23 Feb 2007 23:09:06 +0100 Subject: [Ferret-talk] Combine ferret with database In-Reply-To: <20070223100650.GG19149@cordoba.webit.de> References: <40039EC3-7831-4698-B8E9-1B1E9A216ECA@mac.se> <846f30c70702221038q1d122b87i3277308c6854c8fb@mail.gmail.com> <20070223100650.GG19149@cordoba.webit.de> Message-ID: 23 feb 2007 kl. 11:06 skrev Jens Kraemer: > Hi! > > On Thu, Feb 22, 2007 at 10:38:59AM -0800, Ryan King wrote: >> On 2/22/07, Henrik Zagerholm wrote: >>> >>> Hello list, >>> >>> I wonder if someone has some tips on joining a ferret search with a >>> database. > [..] >> >> >> Use acts_as_ferret [1], then either put the access rights stuff in >> the >> index, or post process the results from AAF. >> >> -ryan >> 1. http://projects.jkraemer.net/acts_as_ferret/wiki > > to expand this a bit, with aaf you can combine a ferret query with > the usual active record conditions, joins and includes like that: > > Model.find_by_contents(query, {}, { :conditions => > ["... ], :include =>... }) > > or just use find_id_by_contents to only retrieve id, model class and > score for each hit, and build your own AR query from that. > Hmm this is really interesting. It looks like this is exactly what I want. I'll give it a try and see how it goes. Thanks for the info! Cheers, henrik > cheers, > Jens > > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen Malessa > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From ryansking at gmail.com Sat Feb 24 04:09:42 2007 From: ryansking at gmail.com (Ryan King) Date: Sat, 24 Feb 2007 01:09:42 -0800 Subject: [Ferret-talk] [AAF] remote indexing via DRb with acts_as_ferret In-Reply-To: <20070204193455.GB29012@cordoba.webit.de> References: <20070204193455.GB29012@cordoba.webit.de> Message-ID: <846f30c70702240109y4b20b488i52b9a68b064d0b13@mail.gmail.com> On 2/4/07, Jens Kraemer wrote: > > Hi! > > Aaf trunk has undergone several major refactorings the last days, with > the result that you can now transparently switch your app from local > to remote indexing and back :-) > > If you plan to scale your app to more than one physical machine, or > if you have problems with corrupted indexes and the like under high > load, you really should give this a try. > > I wrote some documentation to get you started with the remote indexing > stuff at > http://projects.jkraemer.net/acts_as_ferret/wiki/DrbServer FWIW, I'm running this in production with about 5 updates/sec and 20-30 searches/second without problems. Awesome! -ryan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070224/427bfbe5/attachment.html From dbalmain.ml at gmail.com Sat Feb 24 07:34:07 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 24 Feb 2007 23:34:07 +1100 Subject: [Ferret-talk] =?utf-8?b?SG93IHRvIGhhdmUgJ28nID09ICfDtic=?= In-Reply-To: <9E80FA17-7AF1-40DA-BBC5-7ADFDEDA2077@hashref.com> References: <11d4433ce411d7f457fdf09671e32b58@ruby-forum.com> <20070122134913.GL29989@cordoba.webit.de> <9E80FA17-7AF1-40DA-BBC5-7ADFDEDA2077@hashref.com> Message-ID: On 1/23/07, Xavier Noria wrote: > On Jan 22, 2007, at 2:49 PM, Jens Kraemer wrote: > > > On Fri, Jan 19, 2007 at 06:12:12PM +0100, John Private wrote: > >> Greetings, > >> > >> (using acts_as_ferret) > >> > >> So I have a book title "M?ngrel ?Horsemen"" in my index. > >> > >> Searching for "M?ngrel" retrieves the document. > >> > >> But I would like searching for "Mongrel" to also retrieve the > >> document. > >> Which it does not currently. > >> > >> Anyone have any good solutions to this problem? > >> > >> I suppose I could filter the documents and queries first which > >> something > >> like: > >> > >> > >> (Iconv.new('US-ASCII//TRANSLIT', 'utf-8').iconv "M?ngrel > >> ?Horsemen"").gsub(/[^a-zA-Z0-9/im,"") > >> > >> But perhaps there is a better, or built in solution. > > > > I don't think so - a custom Analyzer would be the right place for > > this. > > We use a normalizer to store/query (to be revised for Rails 1.2): > > # Utility method that retursn an ASCIIfied, downcased, and > sanitized string. > # It relies on the Unicode Hacks plugin by means of String#chars. > We assume > # $KCODE is 'u' in environment.rb. By now we support a wide range > of latin > # accented letters, based on the Unicode Character Palette bundled > in Macs. > def self.normalize(str) > n = str.chars.downcase.strip.to_s > n.gsub!(/[????????]/, 'a') > n.gsub!(/?/, 'ae') > n.gsub!(/[??]/, 'd') > n.gsub!(/[?????]/, 'c') > n.gsub!(/[?????????]/, 'e') > n.gsub!(/?/, 'f') > n.gsub!(/[????]/, 'g') > n.gsub!(/[??]/, 'h') > n.gsub!(/[????????]/, 'i') > n.gsub!(/[????]/, 'j') > n.gsub!(/[??]/, 'k') > n.gsub!(/[?????]/, 'l') > n.gsub!(/[??????]/, 'n') > n.gsub!(/[??????????]/, 'o') > n.gsub!(/?/, 'oe') > n.gsub!(/?/, 'q') > n.gsub!(/[???]/, 'r') > n.gsub!(/[?????]/, 's') > n.gsub!(/[????]/, 't') > n.gsub!(/[??????????]/, 'u') > n.gsub!(/?/, 'w') > n.gsub!(/[???]/, 'y') > n.gsub!(/[???]/, 'z') > n.gsub!(/\s+/, ' ') > n.gsub!(/[^\sa-z0-9_-]/, '') > n > end > > And this convenience class method to use in Rails models with > acts_as_ferret (slightly edited): > > # Wrapper function to normalize fields before calling acts_as_ferret > # > # Usage: index_fields [:field1, :field2], :option1 > => ..., :option2 => ... > # > # Please note that your queries should use a "_normalized" suffix on > # each field, i.e: +field1_normalized:foo > class ActiveRecord::Base > def self.index_fields(fields, *options) > aaf_fields = [] > fields.each do |f| > class_eval <<-EOS > def #{f}_normalized > MyAppUtils.normalize(#{f}) > end > EOS > aaf_fields.push ":#{f}_normalized" > end > aaf_call = 'acts_as_ferret :fields => [' + aaf_fields.join > (',') + ']' > options.each do |option_pair| > option_pair.each do |key, value| > aaf_call << ", :#{key} => #{value}" > end > end > logger.info aaf_call > class_eval(aaf_call) > end > end > > -- fxn Sorry to bring this one back from the archives (I'm going through all the email I've missed in my long absence). Anyway, I thought that since not even Jens knew about this I should point out the existence of MappingFilter: http://ferret.davebalmain.com/api/classes/Ferret/Analysis/MappingFilter.html It essentially does the same thing as Xavier's code above but it is much faster. It compiles the mappings to a single deterministic finite automaton (DFA): http://en.wikipedia.org/wiki/Deterministic_finite_state_machine Basically, this means the filter does a single pass through the string to do all the mappings rather than a pass for each mapping. Hope that helps somebody, Dave -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Sat Feb 24 08:58:35 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 25 Feb 2007 00:58:35 +1100 Subject: [Ferret-talk] QueryParser Exception Handling Problem In-Reply-To: <51b62b253cf115e5ba4f505f5e940180@ruby-forum.com> References: <0d9ca21646921fef5c353daf7d9a2735@ruby-forum.com> <6d879fc16968990ecc48406cd920bff2@ruby-forum.com> <20061211083659.GX4076@cordoba.webit.de> <620e9f05e17278d68b3159037e9f6164@ruby-forum.com> <51b62b253cf115e5ba4f505f5e940180@ruby-forum.com> Message-ID: On 12/16/06, Mark wrote: > Jens, > > My response was pre-mature, I have a few tests that throw the following > potentially malicious search queries... > > bad_chars = [':', '(, )', '[, ]', '{, }', '!', '+', '"', '~', '^', '-', > '|', '<, >', '=', '*', '?', '\'', '