From me at benjaminarai.com Sun Mar 2 00:29:39 2008 From: me at benjaminarai.com (Benjamin Arai) Date: Sat, 1 Mar 2008 21:29:39 -0800 Subject: [Ferret-talk] Multiple Filters? In-Reply-To: References: Message-ID: I am having the same problem. I cannot figure out how to get filters working using options for Index::Index search. http://ferret.davebalmain.com/api/classes/Ferret/Index/Index.html#M000022 For example suppose I am trying to run the following query: Search for "frog" filtering on color=green or color=red ---------------------- I have tried the following with no success: ## Search for "frog" t = TermQuery.new(:content, "frog") b = BooleanQuery.new(t,:should) ## Filter on color=(green OR red) t1 = TermQuery.new(:color,"red") t2 = TermQuery.new(:color,"green") b1 = BooleanQuery.new b1.add_query(t1,:should) b1.add_query(t2,:should) b2 = BooleanQuery.new b2.add_query(b1,:must) ## merge query and filter b.add_query(FilterQuery.new(b2),:must) ## setup index index = Index::Index.new(:path => @index_dir) ## search @result = index.search(b, options={}) ---------------------- This last line causes the following error: ruby(534,0xa06fdfa0) malloc: *** error for object 0x4c1a21: Non- aligned pointer being freed *** set a breakpoint in malloc_error_break to debug ruby(534,0xa06fdfa0) malloc: *** error for object 0x7038: Non-aligned pointer being freed *** set a breakpoint in malloc_error_break to debug label:president content:president + limit10offset0 --- Entering query execution --- /Library/Ruby/Gems/1.8/gems/ferret-0.11.6/lib/ferret/index.rb:768: [BUG] Segmentation fault ruby 1.8.6 (2007-09-24) [universal-darwin9.0] Abort trap ---------------------- ??? Benjamin On Feb 28, 2008, at 4:23 AM, Bira wrote: > First, I'd like to thank you all for your patience and help on the > "Document Scores" issue :). > > Now, I have a bit of a "noob" question... Is there a way to apply > multiple filters to the same query? For example, I want to apply both > a RangeFilter and a QueryFilter, but from what I've seen in the API > docs, the :filter parameter that can be passed to the Searcher accepts > only one filter. It does mention the possibility of applying filters > to each other, but provides no examples. > > Is this possible? How can it be done? > > -- > Bira > http://compexplicita.wordpress.com > http://compexplicita.tumblr.com > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080301/17456f97/attachment-0001.html From me at benjaminarai.com Sun Mar 2 01:03:42 2008 From: me at benjaminarai.com (Benjamin Arai) Date: Sat, 1 Mar 2008 22:03:42 -0800 Subject: [Ferret-talk] Compiling ferret trunk on OSX Leopard? Message-ID: <4C4B88A9-96EA-4E70-A3D9-893C99A277E5@benjaminarai.com> Does ferret trunk compile on OSX Leopard? Benjamin From lars.heese at googlemail.com Sun Mar 2 07:29:13 2008 From: lars.heese at googlemail.com (Lars Heese) Date: Sun, 2 Mar 2008 13:29:13 +0100 Subject: [Ferret-talk] Group By in acts_as_ferret Message-ID: <2e3bdda1a53a088ee754eae084276291@ruby-forum.com> Hello, can somebody tell me, if I can use the "group" option with acts_as_ferret? I want to find out in which categories and manufacturers categories the results are in and show it on the left side (like amazon and eBay). Look here: http://search.ebay.com/search/search.dll?from=R40&_trksid=m37&satitle=canon&category0= Best regards Lars -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Sun Mar 2 12:10:12 2008 From: kraemer at webit.de (Jens Kraemer) Date: Sun, 2 Mar 2008 18:10:12 +0100 Subject: [Ferret-talk] Possible bug when creating a Ferret::Search::Sort object? In-Reply-To: References: Message-ID: <20080302171012.GA28250@cordoba.webit.de> Hi, this works fine for me with 0.11.6 with stock Ubuntu and Debian Ruby versions: ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux] (Ubuntu 7.10) ruby 1.8.5 (2006-08-25) [x86_64-linux] (Debian 4.0 (stable)) irb(main):001:0> require 'ferret' => true irb(main):002:0> Ferret::Search::Sort.new( irb(main):003:1* [ irb(main):004:2* Ferret::Search::SortField::SCORE, irb(main):005:2* Ferret::Search::SortField::DOC_ID irb(main):006:2> ], irb(main):007:1* true irb(main):008:1> ) => Sort[!, !] irb(main):009:0> Ferret::VERSION => "0.11.6" Cheers, Jens On Fri, Feb 29, 2008 at 11:19:30PM -0300, Bira wrote: > I may have run across a bug in Ferret: if throws a segmentation fault > when I try to create a Sort object using the default fields (SCORE and > DOC_ID), but setting reverse to true. > > Here's the minimal example: > > #!/usr/bin/env ruby > require 'rubygems' > require 'ferret' > > Ferret::Search::Sort.new > > Ferret::Search::Sort.new( > [ > Ferret::Search::SortField::SCORE, > Ferret::Search::SortField::DOC_ID > ], > false > ) > > Ferret::Search::Sort.new( > [ > Ferret::Search::SortField::SCORE_REV, > Ferret::Search::SortField::DOC_ID_REV > ], > false > ) > > Ferret::Search::Sort.new( > [ > Ferret::Search::SortField::SCORE, > Ferret::Search::SortField::DOC_ID > ], > true > ) > > You should get something like this when creating the last object: > > $ruby sort.rb > sort.rb:23: [BUG] Segmentation fault > ruby 1.8.6 (2007-09-24) [x86_64-linux] > > Aborted > > > Again, this is with Ferret 0.11.6 in Linux. > > Is this a known problem that's being worked on, or should I report it > at the Trac tool on ferret.davebalmain.com? > > -- > Bira > http://compexplicita.wordpress.com > http://compexplicita.tumblr.com > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold From jk at jkraemer.net Sun Mar 2 12:34:49 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Sun, 2 Mar 2008 18:34:49 +0100 Subject: [Ferret-talk] Multiple Filters? In-Reply-To: References: Message-ID: <20080302173448.GW31406@thunder.jkraemer.net> Hi! On Sat, Mar 01, 2008 at 09:29:39PM -0800, Benjamin Arai wrote: > I am having the same problem. > > I cannot figure out how to get filters working using options for > Index::Index search. > > http://ferret.davebalmain.com/api/classes/Ferret/Index/Index.html#M000022 > > For example suppose I am trying to run the following query: > > Search for "frog" filtering on color=green or color=red your problem most probably results from improper API usage. That shouldn't be a reason for Ferret to segfault, but that's another story. The QueryFilter class is a filter, and not a query, and you can't add a filter to a boolean query. Instead, build a FilteredQuery to combine your original query with the filter. Here's a snippet that achieves what you want: require 'rubygems' require 'ferret' include Ferret include Ferret::Search i = I.new i << {:color => 'green', :content => 'frog'} i << {:color => 'red', :content => 'frog'} i << {:color => 'blue', :content => 'frog'} # Search for "frog" t = TermQuery.new(:content, "frog") # Filter on color=(green OR red) t1 = TermQuery.new(:color,"red") t2 = TermQuery.new(:color,"green") b1 = BooleanQuery.new b1.add_query(t1,:should) b1.add_query(t2,:should) filter = QueryFilter.new(b1) filtered_query = FilteredQuery.new(t, filter) puts i.search(filtered_query) > >Now, I have a bit of a "noob" question... Is there a way to apply > >multiple filters to the same query? For example, I want to apply both > >a RangeFilter and a QueryFilter, but from what I've seen in the API > >docs, the :filter parameter that can be passed to the Searcher accepts > >only one filter. It does mention the possibility of applying filters > >to each other, but provides no examples. > > > >Is this possible? How can it be done? Build FilteredQueries like above and chain them together: i << {:color => 'red green', :content => 'frog'} # Search for "frog" t = TermQuery.new(:content, "frog") # Filter on color=red f1 = QueryFilter.new(TermQuery.new(:color,"red")) # Filter on color=green f2 = QueryFilter.new(TermQuery.new(:color,"green")) fq1 = FilteredQuery.new(t, f1) fq2 = FilteredQuery.new(fq1, f2) puts i.search(fq2) Cheers, Jens -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From me at benjaminarai.com Sun Mar 2 13:25:21 2008 From: me at benjaminarai.com (Benjamin Arai) Date: Sun, 02 Mar 2008 10:25:21 -0800 Subject: [Ferret-talk] Possible bug when creating a Ferret::Search::Sort object? In-Reply-To: <20080302171012.GA28250@cordoba.webit.de> References: <20080302171012.GA28250@cordoba.webit.de> Message-ID: <47CAF111.2040405@benjaminarai.com> Works fine for me as well using CentOS 5 and OSX Leopard. Benjamin Jens Kraemer wrote: > Hi, > > this works fine for me with 0.11.6 with stock Ubuntu and Debian Ruby > versions: > > ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux] (Ubuntu 7.10) > ruby 1.8.5 (2006-08-25) [x86_64-linux] (Debian 4.0 (stable)) > > > irb(main):001:0> require 'ferret' > => true > irb(main):002:0> Ferret::Search::Sort.new( > irb(main):003:1* [ > irb(main):004:2* Ferret::Search::SortField::SCORE, > irb(main):005:2* Ferret::Search::SortField::DOC_ID > irb(main):006:2> ], > irb(main):007:1* true > irb(main):008:1> ) > => Sort[!, !] > irb(main):009:0> Ferret::VERSION > => "0.11.6" > > > Cheers, > Jens > > On Fri, Feb 29, 2008 at 11:19:30PM -0300, Bira wrote: > >> I may have run across a bug in Ferret: if throws a segmentation fault >> when I try to create a Sort object using the default fields (SCORE and >> DOC_ID), but setting reverse to true. >> >> Here's the minimal example: >> >> #!/usr/bin/env ruby >> require 'rubygems' >> require 'ferret' >> >> Ferret::Search::Sort.new >> >> Ferret::Search::Sort.new( >> [ >> Ferret::Search::SortField::SCORE, >> Ferret::Search::SortField::DOC_ID >> ], >> false >> ) >> >> Ferret::Search::Sort.new( >> [ >> Ferret::Search::SortField::SCORE_REV, >> Ferret::Search::SortField::DOC_ID_REV >> ], >> false >> ) >> >> Ferret::Search::Sort.new( >> [ >> Ferret::Search::SortField::SCORE, >> Ferret::Search::SortField::DOC_ID >> ], >> true >> ) >> >> You should get something like this when creating the last object: >> >> $ruby sort.rb >> sort.rb:23: [BUG] Segmentation fault >> ruby 1.8.6 (2007-09-24) [x86_64-linux] >> >> Aborted >> >> >> Again, this is with Ferret 0.11.6 in Linux. >> >> Is this a known problem that's being worked on, or should I report it >> at the Trac tool on ferret.davebalmain.com? >> >> -- >> Bira >> http://compexplicita.wordpress.com >> http://compexplicita.tumblr.com >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk >> >> > > From u.alberton at gmail.com Mon Mar 3 05:52:44 2008 From: u.alberton at gmail.com (Bira) Date: Mon, 3 Mar 2008 07:52:44 -0300 Subject: [Ferret-talk] Possible bug when creating a Ferret::Search::Sort object? In-Reply-To: <20080302171012.GA28250@cordoba.webit.de> References: <20080302171012.GA28250@cordoba.webit.de> Message-ID: On Sun, Mar 2, 2008 at 2:10 PM, Jens Kraemer wrote: > Hi, > > this works fine for me with 0.11.6 with stock Ubuntu and Debian Ruby > versions: > > ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux] (Ubuntu 7.10) > ruby 1.8.5 (2006-08-25) [x86_64-linux] (Debian 4.0 (stable)) I'm using Ferret 0.11.6 on Gentoo Linux: ruby 1.8.6 (2007-09-24 patchlevel 111) [x86_64-linux] So maybe this is a Ruby 1.8.6-p111 bug? -- Bira http://compexplicita.wordpress.com http://compexplicita.tumblr.com From rakrok at gmail.com Mon Mar 3 21:02:45 2008 From: rakrok at gmail.com (Rak Rok) Date: Mon, 3 Mar 2008 21:02:45 -0500 Subject: [Ferret-talk] Search memory usage Message-ID: <67f853a30803031802l21701775s91dfccae65e6754d@mail.gmail.com> Hello, I'm a new user of Ferret, so this might be a silly question. I was wondering, where can I find details about search memory usage? I read the O'Reilly booklet + googled but couldn't find much info. There is a good explanation of how memory is used at indexing time [bound by, amongst other things, :max_buffer_memory and :max_buffered_docs]. But how does it work at search time - do the same options apply? Will parts of the index be cached as they are accessed? How about search results, do they get cached until :max_buffer_memory and/or :max_buffered_docs is reached? I understand that the OS will perform page caching - but that is beyond my control. I'm more interested in Ferret's memory behavior. In case it matters, this is because I'm building a server task that searches across many indexes. Ideally, the most used indexes will remain cached in memory. When a request comes in for an index not already in memory, I'll swap out the least recently used index and replace it with one that can satisfy the incoming request. I'm bound by how much memory I can use [this is a 32bit task], and so it would be good if I can bound each index's memory usage as well. Thanks in advance for any info, -rr- From briangan at email.com Wed Mar 5 12:06:38 2008 From: briangan at email.com (Brian Gan) Date: Wed, 05 Mar 2008 09:06:38 -0800 Subject: [Ferret-talk] Index Searcher Causes GC Memory Error: "irb: double free or corruption" Message-ID: <47CED31E.6070702@email.com> My linux Ruby application is using Ferret 0.11.4. I created my own class IndexSearcher to contain the Searcher of multiple directories. If I do not have the searcher.close called, the end of runner/console or runner/server will pop out with system error: *** glibc detected *** irb: double free or corruption (fasttop): 0x0a51d6c0 *** ======= Backtrace: ========= /lib/libc.so.6[0x638ac1] /lib/libc.so.6(cfree+0x90)[0x63c0f0] /usr/lib/ruby/gems/gems/ferret-0.11.4/lib/ferret_ext.so[0x247d75] /usr/lib/ruby/gems/gems/ferret-0.11.4/lib/ferret_ext.so[0x219745] /usr/lib/libruby.so.1.8(rb_gc_call_finalizer_at_exit+0xa7)[0xc3e237] /usr/lib/libruby.so.1.8[0xc239e7] /usr/lib/libruby.so.1.8(ruby_cleanup+0x100)[0xc2c280] /usr/lib/libruby.so.1.8(ruby_stop+0x1d)[0xc2c3ad] /usr/lib/libruby.so.1.8[0xc372c1] irb[0x804868d] /lib/libc.so.6(__libc_start_main+0xe0)[0x5e5390] irb[0x8048581] Here is the code of my class. Any sign of what is wrong with the memory handling? class IndexSearcher attr_accessor :searcher, :sub_searchers, :object_type # @param paths [Array of String] full local paths def initialize( object_type, paths ) # Would've used this way since it's simpler and said by author to be faster; but invalid paths will break this entirely # self.searcher = Ferret::Search::Searcher.new(Ferret::Index::IndexReader.new(paths) ) self.sub_searchers = [] paths.each do |cur_path| begin sub_s = Ferret::Search::Searcher.new(cur_path ) self.sub_searchers << sub_s if sub_s rescue Exception => index_e puts "** IndexSearcher.new: #{index_e.message}" end end if self.sub_searchers.size > 0 self.searcher = Ferret::Search::MultiSearcher.new(self.sub_searchers) else self.searcher = (Ferret::I.new).searcher end self.object_type = object_type end # This doc/[] has a different to Index method: argument only wants doc_id [Integer] def doc(doc_id) return self.searcher[doc_id] rescue Exception => e puts "** IndexSearcher(#{object_type}).doc #{doc_id}\n" << e.backtrace.join("\n") return nil end alias :[] :doc ############################# # Querying methods def process_query(query) query.is_a?(Ferret::Search::Query) ? query : SearchIndex::QUERY_PARSER.parse(query) end def search(query, options = {}) query_obj = process_query(query) self.searcher.search( query_obj, options ) end def search_each (query, options = {}) query_obj = process_query(query) self.searcher.search_each( query_obj, options ) end def explain (query, doc) query_obj = process_query(query) self.searcher.explain( query_obj, doc ) end # Segmentation Fault when index_searcher.highlight def highlight (query, doc_id, options = {}) query_obj = process_query(query) doc = self.searcher[doc_id] field_value = doc[options[:field] ] field_index = Ferret::I.new(:analyzer => SearchIndex::STEMMING_ANALYZER) field_index << {:keywords => field_value} fvh = field_index.highlight(query, 0, options.merge({ :field => :keywords }) ) return fvh rescue Exception => e puts "** IndexSearch.highlight('#{query}'): #{e.message}" # << e.backtrace.join("\n") return field_value end def doc_freq(field, term) self.searcher.doc_freq(field, term) end ############################# # Self Util methods def reader self.searcher.reader end def size if reader total_size = reader.num_docs else total_size = 0 self.sub_searchers.each do |sub_searcher| total_size += sub_searcher.reader.num_docs if sub_searcher.reader end return total_size end end alias :num_docs :size def close self.searcher.close end end From samuelgiffney at gmail.com Thu Mar 13 00:01:26 2008 From: samuelgiffney at gmail.com (Sam Giffney) Date: Thu, 13 Mar 2008 05:01:26 +0100 Subject: [Ferret-talk] acts_as_ferret and associations In-Reply-To: References: <8e76359318638fb9f0d8ac5f5994aa6f@ruby-forum.com> <20061107073623.GA27583@cordoba.webit.de> <1012fbe7cb9fb07606c33a80dd630701@ruby-forum.com> <38eeab4a612282e8f7d83b9fd70cc5bc@ruby-forum.com> Message-ID: Followed this thread and have a has_many_through association indexing fine. The only problem is the N+1 sql queries incurred with an index rebuild, full or partial. Is there any way to enable eager loading on batch index updates? Cheers, Sam -- Posted via http://www.ruby-forum.com/. From jk at jkraemer.net Wed Mar 19 14:26:30 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Wed, 19 Mar 2008 19:26:30 +0100 Subject: [Ferret-talk] acts_as_ferret and associations In-Reply-To: References: <8e76359318638fb9f0d8ac5f5994aa6f@ruby-forum.com> <20061107073623.GA27583@cordoba.webit.de> <1012fbe7cb9fb07606c33a80dd630701@ruby-forum.com> <38eeab4a612282e8f7d83b9fd70cc5bc@ruby-forum.com> Message-ID: <20080319182630.GV18068@thunder.jkraemer.net> Hi! On Thu, Mar 13, 2008 at 05:01:26AM +0100, Sam Giffney wrote: > Followed this thread and have a has_many_through association indexing > fine. The only problem is the N+1 sql queries incurred with an index > rebuild, full or partial. > > Is there any way to enable eager loading on batch index updates? Yes. Acts_as_ferret defines a class method named records_for_rebuild on your model, which you may override to use a customized find call. See lib/class_methods.rb for the original implementation. Cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From palasrinivasarao14 at gmail.com Wed Mar 19 23:32:35 2008 From: palasrinivasarao14 at gmail.com (srinivas rao) Date: Thu, 20 Mar 2008 09:02:35 +0530 Subject: [Ferret-talk] Ferret installation problem Message-ID: <3881e8fc0803192032tcc9f77fx744a7625cfdfa0da@mail.gmail.com> Hi All, I got one problem in the installation of ferret.I am using rails 2.0.2 and IDE is radrails. When I am going to install ferret, it is showing one error. i.e failed to building gem native extension. And D:/Program Files/ruby/bin/ruby.exe extconf.rb install ferret D:/Program is not recognized as an internal or external command, operable program or batch file. I am new to ROR. But I want to use the ferret. Please help me in this issue. Thank you, srinivas rao.pala -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080320/702ae57b/attachment.html From kraemer at webit.de Thu Mar 20 08:45:59 2008 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 20 Mar 2008 13:45:59 +0100 Subject: [Ferret-talk] Ferret installation problem In-Reply-To: <3881e8fc0803192032tcc9f77fx744a7625cfdfa0da@mail.gmail.com> References: <3881e8fc0803192032tcc9f77fx744a7625cfdfa0da@mail.gmail.com> Message-ID: <20080320124559.GE12053@cordoba.webit.de> Looks like installing your ruby into a location without spaces in the path might help. Cheers, Jens On Thu, Mar 20, 2008 at 09:02:35AM +0530, srinivas rao wrote: > Hi All, > I got one problem in the installation of ferret.I am using rails 2.0.2 and > IDE is radrails. When I am going to install ferret, it is showing one error. > i.e failed to building gem native extension. > And > D:/Program Files/ruby/bin/ruby.exe extconf.rb install ferret > D:/Program is not recognized as an internal or external command, operable > program or batch file. > > I am new to ROR. But I want to use the ferret. Please help me in this issue. > > Thank you, > srinivas rao.pala > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold From logicalcat at gmail.com Tue Mar 25 01:29:14 2008 From: logicalcat at gmail.com (R. Bryan Hughes) Date: Mon, 24 Mar 2008 23:29:14 -0600 Subject: [Ferret-talk] parallel indexing with unique id? Message-ID: Hello all, Is it possible to use parallel indexing and still ensure unique documents in the merged index? Using the canned example, I'm ending up with non-unique entries. It's just adding them all together even though I've defined unique a :key. How can I tell the IndexWriter to keep my uniqueness constraints? For example, imagine that I have two indexes of a phone book: "index_one" contains a unique set of names A-through-P (let's say the key is their phone number). "index_two" contains a unique set of names K-through-Z. When I merge them, I would hope to get a unique index of A-through-Z, but I'm getting double entries where they overlap, K-through-P. Here's some code to demonstrate. My :id field is a long-ish unique alphanumeric string. In the example below, "one" and "two" are actually identical copies, each containing about 60,000 docs. I was hoping to get a combined index containing the same 60,000 docs, but ended up with 120,000. Any help will be greatly appreciated. Thanks! #################### one = "Documents/bucket/index_1" two = "Documents/bucket/index_2" merged = "Documents/bucket/merged_index" pfa = PerFieldAnalyzer.new(LetterAnalyzer.new) pfa[:id] = WhiteSpaceAnalyzer.new field_infos = FieldInfos.new(:term_vector => :no) field_infos.add_field(:id, :index => :untokenized) index_two = Ferret::I.new( :key => :id, :max_buffer_memory => 0x8000000, :merge_factor => 5, :path => one, :analyzer => pfa, :field_infos => field_infos) index_one = Ferret::I.new( :key => :id, :max_buffer_memory => 0x8000000, :merge_factor => 5, :path => two, :analyzer => pfa, :field_infos => field_infos) readers = [] readers << IndexReader.new(one) readers << IndexReader.new(two) puts "size of index_one = "+index_one.size.to_s puts "size of index_two = "+index_two.size.to_s index_writer = IndexWriter.new(:path => merged) index_writer.add_readers(readers) index_writer.close() readers.each{ |reader| reader.close() } i = Ferret::I.new(:path => merged) puts "size before optimize = "+i.size.to_s i.optimize puts "size after optimize = "+i.size.to_s -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080324/eacc1c0d/attachment.html From jk at jkraemer.net Tue Mar 25 04:12:47 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Tue, 25 Mar 2008 09:12:47 +0100 Subject: [Ferret-talk] parallel indexing with unique id? In-Reply-To: References: Message-ID: <20080325081247.GI13705@thunder.jkraemer.net> Hi! On Mon, Mar 24, 2008 at 11:29:14PM -0600, R. Bryan Hughes wrote: > Hello all, > Is it possible to use parallel indexing and still ensure unique documents in > the merged index? Using the canned example, I'm ending up with non-unique > entries. It's just adding them all together even though I've defined unique > a :key. > > How can I tell the IndexWriter to keep my uniqueness constraints? You can't. The :key option is only interpreted by Ferret's Index class, which will delete any already existing records with the same key field value before adding a new record. Cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From hulme at ebi.ac.uk Tue Mar 25 05:33:22 2008 From: hulme at ebi.ac.uk (Robert Hulme) Date: Tue, 25 Mar 2008 09:33:22 +0000 Subject: [Ferret-talk] Which field(s) matched? Message-ID: <47E8C6E2.5000202@ebi.ac.uk> Is there a way to discover which fields in a document matched the search that was performed? I've just started with Ferret, but so far I can only get back the ids of the documents that were matched (along with the score). I'm aware there is a highlighting method, but I was hoping for something computer readable (unless I'm misunderstanding what data can be gained from that method). Thanks -Rob From logicalcat at gmail.com Wed Mar 26 11:22:55 2008 From: logicalcat at gmail.com (R. Bryan Hughes) Date: Wed, 26 Mar 2008 09:22:55 -0600 Subject: [Ferret-talk] parallel indexing with unique id? In-Reply-To: <20080325081247.GI13705@thunder.jkraemer.net> References: <20080325081247.GI13705@thunder.jkraemer.net> Message-ID: Thanks! You saved me lots of time. On 3/25/08, Jens Kraemer wrote: > > Hi! > > On Mon, Mar 24, 2008 at 11:29:14PM -0600, R. Bryan Hughes wrote: > > Hello all, > > Is it possible to use parallel indexing and still ensure unique > documents in > > the merged index? Using the canned example, I'm ending up with > non-unique > > entries. It's just adding them all together even though I've defined > unique > > a :key. > > > > How can I tell the IndexWriter to keep my uniqueness constraints? > > You can't. The :key option is only interpreted by Ferret's Index class, > which will delete any already existing records with the same key field > value before adding a new record. > > Cheers, > Jens > > > -- > Jens Kr?mer > http://www.jkraemer.net/ - Blog > http://www.omdb.org/ - The new free film database > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080326/8019cc5f/attachment.html