From william.yeung at gearboxsoft.com Mon Aug 17 23:41:35 2009 From: william.yeung at gearboxsoft.com (Yeung William) Date: Tue, 18 Aug 2009 11:41:35 +0800 Subject: [Ferret-talk] Ferret Usability Message-ID: <7BE72E1A-A02C-4720-A6FA-A1776A158650@gearboxsoft.com> Guys, I am new to Ferret- I have mixed feeling about this thing. On one side I really like the simplicity of the system- its easy to deploy and used, and I have a lot of choices on integration from aaf or doing my own isn't too hard too. On the other hand, I heard a lot of horrible stories from index corruption to segfaults. The most classical thread I can find is here: http://groups.google.com/group/rubyonrails-deployment/browse_thread/thread/980fe7cb20cb97dd Even Ezra at EY is basically saying Ferret is unusable. May I know how's the situation now? Anyone can nail down what actually had happened on their segfaults/index corruption? From femtowin at gmail.com Tue Aug 18 02:27:15 2009 From: femtowin at gmail.com (femto Zheng) Date: Tue, 18 Aug 2009 14:27:15 +0800 Subject: [Ferret-talk] Can't remove duplicate Message-ID: <91170ee40908172327w5f87b2b9we92bb250bc7d01f@mail.gmail.com> Hello all, I can't remove duplicate,I'm using ferret to index log file in order to monitor application activity, what I want to do is index data based on the uniqueness of [filename,line](actullay should be [host,filename,line], the code is following: if !$indexer field_infos = Ferret::Index::FieldInfos.new(:index => :untokenized_omit_norms, :term_vector => :no) field_infos.add_field(:content, :store => :yes, :index => :yes) $indexer = Ferret::I.new(:path => index_dir, :field_infos => field_infos, :key => [:filename, :line], :max_buffered_docs=>100) #$indexer ||= Ferret::I.new(:path=>index_dir, :key => ['filename', 'line'], :max_buffered_docs=>100) #unique host,file_name,line #$indexer.field_infos.add_field(:time, # #:default_boost => 20, # :store => :yes, # :index => :untokenized, # :term_vector => :no) end but the problem is, I will index a new datum even if the [filename,line] is same, even I change :key => ["filename", "line"], it also doesn't work, what's the problem? Thanks. From femtowin at gmail.com Tue Aug 18 03:30:30 2009 From: femtowin at gmail.com (femto Zheng) Date: Tue, 18 Aug 2009 15:30:30 +0800 Subject: [Ferret-talk] How do I index very large file? Message-ID: <91170ee40908180030qd5086a2s96e370d2d4adad3@mail.gmail.com> Hello all, I'm doing a monitor application, which fetches log file of application and indexing it, how do I index very large file?, like up to serveral GB. because the application logs may log very large file in short time.