From san.r.patil at gmail.com Fri Nov 13 14:51:30 2009 From: san.r.patil at gmail.com (Santoshkumar Patil) Date: Fri, 13 Nov 2009 11:51:30 -0800 (PST) Subject: [Ferret-talk] Invitation to connect on LinkedIn Message-ID: <1006437251.9947758.1258141890388.JavaMail.app@ech3-cdn13.prod> LinkedIn ------------ I'd like to add you to my professional network on LinkedIn. - Santoshkumar Confirm that you know Santoshkumar Patil https://www.linkedin.com/e/isd/861195973/ZwZ03mvk/ Every day, millions of professionals like Santoshkumar Patil use LinkedIn to connect with colleagues, find experts, and explore opportunities. ------ (c) 2009, LinkedIn Corporation -------------- next part -------------- An HTML attachment was scrubbed... URL: From toastkid.williams at gmail.com Thu Nov 26 06:13:12 2009 From: toastkid.williams at gmail.com (Max Williams) Date: Thu, 26 Nov 2009 11:13:12 +0000 Subject: [Ferret-talk] Problem with case sensitivity Message-ID: I'm using a custom stem analyser in my searches and my indexing. The analyser is defined thus: module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) text.downcase! RAILS_DEFAULT_LOGGER.debug "SEARCHING, field = #{field.inspect}, text = #{text.inspect}" tokenizer = StandardTokenizer.new(text) filter = StemFilter.new(tokenizer) filter end end end I use it in my indexing like this: acts_as_ferret({ :store_class_name => true, :ferret => { :analyzer => Ferret::Analysis::StemmingAnalyzer.new }, :fields => {:property_names => { :boost => 3.0 }, ....etc }}) And in a search like this: search_class.find_ids_with_ferret(search_term, {:limit => 10000, :analyzer => Ferret::Analysis::StemmingAnalyzer.new}) do |model, r_id, score| r_id = r_id.to_i ferret_ids << r_id self.scores_hash[r_id] = score end I have a problem with case sensitivity - basically, searches only work when they are lowercase: even when it looks like the text stored in the index is uppercase. From the console - >> resource.to_doc => {:resource_id=>"59", :property_names=>"Bb Clarinet Clarinet Family Woodwind Instrumental and Vocal Image Resources Types" } >> TeachingObject.find_with_ferret("Vocal", :page => 1, :per_page => 1000).include?(resource) => false >> TeachingObject.find_with_ferret("vocal", :page => 1, :per_page => 1000).include?(resource) => true I think i have my stemming set up wrong, i'm not sure if it is even being used. I implemented it so that searches allowed pluralised and singular terms, and that seems to work, eg >> TeachingObject.find_with_ferret("vocals", :page => 1, :per_page => 1000).include?(resource) => true But the case sensitivity thing has me stumped. I thought that the downcase! call on the search term would make case irrelevant for searching but that seems not to be the case. Can anyone set me straight? -------------- next part -------------- An HTML attachment was scrubbed... URL: From toastkid.williams at gmail.com Thu Nov 26 06:58:33 2009 From: toastkid.williams at gmail.com (Max Williams) Date: Thu, 26 Nov 2009 12:58:33 +0100 Subject: [Ferret-talk] Problem with case sensitivity In-Reply-To: References: Message-ID: <376b32f175cc637ef441217fe2033fc7@ruby-forum.com> I think i fixed this. I did three things - changed my custom analyser to inherit from Ferret::Analysis::Analyzer - ditched the downcase! line - instead of doing downcase!, I added LowerCaseFilter.new(filter) to my chain module Ferret::Analysis class StemmingAnalyzer < Ferret::Analysis::Analyzer def token_stream(field, text) RAILS_DEFAULT_LOGGER.debug "SEARCHING, field = #{field.inspect}, text = #{text.inspect}" tokenizer = StandardTokenizer.new(text) filter = StemFilter.new(tokenizer) low_filter = LowerCaseFilter.new(filter) low_filter end end end After calling ferret_update on the resource, i can now get it with 'vocal' or 'Vocal'. I'd still welcome any further advice on this, in case i'm not doing something right. thanks, max -- Posted via http://www.ruby-forum.com/.