From aross at nindy.com Thu Mar 1 03:37:57 2007 From: aross at nindy.com (Andy Ross) Date: Thu, 1 Mar 2007 09:37:57 +0100 Subject: [Ferret-talk] [ANN] Ferret 0.11.2-rc3 released In-Reply-To: References: Message-ID: <9499a868b6e9dc344aebe3ee82d4fac4@ruby-forum.com> David Balmain wrote: > Hey guys, > > I've just removed the -fno-stack-protector flag from the release so > those who had trouble because of this should now be able to install > 0.11.2-rc3. If you have any problems with this release, please let me > ASAP. > > Cheers, > Dave Dave, Thanks for getting out this release. Unfortunately, I am still seeing some crashing w/r/t ferret: /usr/lib/ruby/gems/1.8/gems/ferret-0.11.2/lib/ferret/index.rb:384: [BUG] Segmentation fault ruby 1.8.4 (2005-12-24) [i486-linux] This is a Rails site that has multiple servers (mongrel) and also multiple background processes that access the same index. I was under the assumption that this new version was supposed to fix these issues, but perhaps that is not true? This is an ubuntu box. Thanks for your help! -Andy -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Mar 1 03:40:13 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 1 Mar 2007 09:40:13 +0100 Subject: [Ferret-talk] How to use find_options in multi_search In-Reply-To: References: Message-ID: <20070301084013.GJ10083@cordoba.webit.de> Hi Mischa! On Wed, Feb 28, 2007 at 09:58:18PM +0100, Mischa Berger wrote: > Hello everyone, > > I'm using multi-search to search in some attributes of two classes. One > of the attributes is the id of the customer. For each multi_search I > want to do a give the id of the current customer as a parameter. This > should only return results for the given customer. > > My current code looks like this: > > Folder.multi_search(@search_query, [Myfile]) > > I noticed in the API you can add options and find_options to this. How > do I use these parameters, so only results for a certain customer are > returned? With aaf trunk you can do Folder.multi_search(@search_query, [Myfile], {}, { :conditions => [ "customer_id=?", @customer.id ] }) As there's no way to specify per-model find_options, you need a customer_id field in both models, MyFile and Folder. However it would not be that hard to add a model dimension to the find_options hash for multi_search (patches welcome ;-). aaf 0.3.1 (stable) doesn't support find_options with multi_search. Cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Thu Mar 1 03:47:47 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 1 Mar 2007 09:47:47 +0100 Subject: [Ferret-talk] Problems using acts_as_ferret In-Reply-To: References: <16A686AE-41AF-41E6-8FB1-FF8DC5A0CF01@gmail.com> <20070228155023.GI10083@cordoba.webit.de> Message-ID: <20070301084747.GL10083@cordoba.webit.de> On Wed, Feb 28, 2007 at 05:17:02PM +0100, Linus wrote: [..] > > Need to find out how to generate the index on data export, not form web > app :P Have a look at backgroundrb - it's great for such long running tasks. cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From mischa78 at xs4all.nl Thu Mar 1 07:22:01 2007 From: mischa78 at xs4all.nl (Mischa Berger) Date: Thu, 1 Mar 2007 13:22:01 +0100 Subject: [Ferret-talk] How to use find_options in multi_search In-Reply-To: <20070301084013.GJ10083@cordoba.webit.de> References: <20070301084013.GJ10083@cordoba.webit.de> Message-ID: <47d406bee702a6e15beb004a40ea141e@ruby-forum.com> Hi Jens, > With aaf trunk you can do > > Folder.multi_search(@search_query, [Myfile], {}, { :conditions => [ > "customer_id=?", @customer.id ] }) Thanks for your answer. When do you expect this to be in a stable release? I'm assuming the customer_id field needs to be indexed too for this to work, is that correct? Cheers, Mischa. -- http://boxroom.rubyforge.org -- Posted via http://www.ruby-forum.com/. From ror at philippeapril.com Thu Mar 1 08:05:37 2007 From: ror at philippeapril.com (Philippe April) Date: Thu, 01 Mar 2007 08:05:37 -0500 Subject: [Ferret-talk] Need help creating my own Filter in Ruby In-Reply-To: References: Message-ID: <7E5B1A4B-0C13-48EF-81E0-BB8EB0D4A95F@philippeapril.com> Hi Dave, I hear you... I'll try to make something up for you... Thanks :) But just to know: implementing a Filter IS the right solution to this right? On 28-Feb-07, at 11:56 PM, David Balmain wrote: > On 3/1/07, Philippe April wrote: >> I posted a Trac ticket about it, but I thought I'd ask the mailing >> list to reach more people. > > Hi Philippe, > > I'd love to help you with this but I can't reproduce it here. If you > can modify the example I gave under your ticket to reproduce the > problem or produce your own self contained failing test I will be able > to fix the problem right away. Otherwise I waste too much time trying > to reproduce the problem. > > Cheers, > Dave > > -- > Dave Balmain > http://www.davebalmain.com/ From ror at philippeapril.com Thu Mar 1 08:27:45 2007 From: ror at philippeapril.com (Philippe April) Date: Thu, 01 Mar 2007 08:27:45 -0500 Subject: [Ferret-talk] Need help creating my own Filter in Ruby In-Reply-To: References: Message-ID: <5ABAB199-45F0-4796-8987-C19688A2F98E@philippeapril.com> Hi Dave, I just put a way how to reproduce in the Trac ticket. My filter seems to work fine when it's included alone with a StandardTokenizer only but as soon as I put another kind of filter in the chain (I used HyphenFilter here, but it does the same error with any other filter), errors show up randomly. See for yourself, I hope you can trigger the error too :) On 28-Feb-07, at 11:56 PM, David Balmain wrote: > On 3/1/07, Philippe April wrote: >> I posted a Trac ticket about it, but I thought I'd ask the mailing >> list to reach more people. > > Hi Philippe, > > I'd love to help you with this but I can't reproduce it here. If you > can modify the example I gave under your ticket to reproduce the > problem or produce your own self contained failing test I will be able > to fix the problem right away. Otherwise I waste too much time trying > to reproduce the problem. > > Cheers, > Dave > > -- > Dave Balmain > http://www.davebalmain.com/ From kraemer at webit.de Thu Mar 1 08:42:08 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 1 Mar 2007 14:42:08 +0100 Subject: [Ferret-talk] How to use find_options in multi_search In-Reply-To: <47d406bee702a6e15beb004a40ea141e@ruby-forum.com> References: <20070301084013.GJ10083@cordoba.webit.de> <47d406bee702a6e15beb004a40ea141e@ruby-forum.com> Message-ID: <20070301134208.GM10083@cordoba.webit.de> On Thu, Mar 01, 2007 at 01:22:01PM +0100, Mischa Berger wrote: > Hi Jens, > > > With aaf trunk you can do > > > > Folder.multi_search(@search_query, [Myfile], {}, { :conditions => [ > > "customer_id=?", @customer.id ] }) > > Thanks for your answer. When do you expect this to be in a stable > release? As I'm quite busy atm that might take a bit (2 or 3 weeks). However the current trunk is not 'unstable' - it's in a useable state and passes all unit tests, so you should give it a try if you need that feature now. > I'm assuming the customer_id field needs to be indexed too for this to > work, is that correct? no, the find_options are only used when fetching the results via active record. If you indexed the customer_id field you could just append "+customer_id:#{normalize(@customer.id)}" to the ferret query string. (normalize should be a function that pads the customer id to a fixed length string) Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From rainer.jung at gmail.com Thu Mar 1 09:00:08 2007 From: rainer.jung at gmail.com (Rainer Jung) Date: Thu, 1 Mar 2007 15:00:08 +0100 Subject: [Ferret-talk] Error with new ferret Message-ID: <14ae67820703010600k9c2f9dasb5db51b2efe78081@mail.gmail.com> Hello there. I get two different errors, after i installed the new ferret-version. I havn't seen any comments about changing the API, so there should be no problem using acts_as_ferret as i did before: File Not Found Error occured at :117 in xpop_context Error occured in fs_store.c:329 - fs_open_input tried to open "/home/app/current/config/../index/production/client/_8km_1.del" but it doesn't exist: /usr/lib64/ruby/gems/1.8/gems/ferret-0.11.2/lib/ferret/index.rb:285:in `delete' The file _8km.cfs does exist, but no _8km_1.del (the same happens with _8km_0.del). Then i have another error: End-of-File Error occured at :117 in xpop_context Error occured in store.c:216 - is_refill current pos = 0, file length = 0 /usr/lib64/ruby/gems/1.8/gems/ferret-0.11.2/lib/ferret/index.rb:285:in `delete' Somehow, both errors do happen at the same place. By the way, my application is running on a Intel 64bit machine (Linux app 2.6.18 x86_64 x86_64 x86_64 GNU/Linux), and i had some warning on compiling about some wrong casts of pointers, could they cause some problems? Regards, Rainer From dbalmain.ml at gmail.com Thu Mar 1 09:20:24 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 2 Mar 2007 01:20:24 +1100 Subject: [Ferret-talk] Need help creating my own Filter in Ruby In-Reply-To: <5ABAB199-45F0-4796-8987-C19688A2F98E@philippeapril.com> References: <5ABAB199-45F0-4796-8987-C19688A2F98E@philippeapril.com> Message-ID: On 3/2/07, Philippe April wrote: > Hi Dave, > > I just put a way how to reproduce in the Trac ticket. My filter seems > to work fine when it's included alone with a StandardTokenizer only > but as soon as I put another kind of filter in the chain (I used > HyphenFilter here, but it does the same error with any other filter), > errors show up randomly. > > See for yourself, I hope you can trigger the error too :) Thanks Philippe, I'll get that fixed as soon as possible. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Thu Mar 1 09:24:02 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 2 Mar 2007 01:24:02 +1100 Subject: [Ferret-talk] [ANN] Ferret 0.11.2-rc3 released In-Reply-To: <9499a868b6e9dc344aebe3ee82d4fac4@ruby-forum.com> References: <9499a868b6e9dc344aebe3ee82d4fac4@ruby-forum.com> Message-ID: On 3/1/07, Andy Ross wrote: > David Balmain wrote: > > Hey guys, > > > > I've just removed the -fno-stack-protector flag from the release so > > those who had trouble because of this should now be able to install > > 0.11.2-rc3. If you have any problems with this release, please let me > > ASAP. > > > > Cheers, > > Dave > > Dave, > > Thanks for getting out this release. Unfortunately, I am still seeing > some crashing w/r/t ferret: > > /usr/lib/ruby/gems/1.8/gems/ferret-0.11.2/lib/ferret/index.rb:384: [BUG] > Segmentation fault > ruby 1.8.4 (2005-12-24) [i486-linux] > > This is a Rails site that has multiple servers (mongrel) and also > multiple background processes that access the same index. I was under > the assumption that this new version was supposed to fix these issues, > but perhaps that is not true? This is an ubuntu box. > > Thanks for your help! Hi Andy, I wish that is was just one or two problems to fix but there are still a few issues. If you can reproduce this bug in a self contained test it would be really helpful. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From caleb at inforadical.net Thu Mar 1 10:24:28 2007 From: caleb at inforadical.net (Caleb Clausen) Date: Thu, 01 Mar 2007 07:24:28 -0800 Subject: [Ferret-talk] FerretHash In-Reply-To: References: Message-ID: <45E6F02C.9020701@inforadical.net> Dave, thank you so much for the 0.11 release(s). You have solved many problems for me. As part of my appreciation for your good works, I am offering up for public consideration a silly little class that I wrote. (Code is below.) This class offers a simplified Hash-like interface to (a very restricted subset of) Ferret. Hence I call it FerretHash. FerretHash comes with its very own pet Ferret bug. Run the crude unit test to see the problem. (Long story short, it looks like term frequency, as reported by IndexReader#terms, does not take deletions into account.) require 'rubygems' require 'ferret' require 'tempfile' class FerretHash def initialize(name=nil) #make temp file name unless path tf=Tempfile.new("ferrethash_#$$") name=tf.path tf.close File.unlink name end #open new ferret index with temp name @name=name open_writer end def open_writer @writer and return #a schema for the hash... fis=Ferret::Index::FieldInfos.new fis.add_field(:key, :index=>:untokenized, :store=>:no, :term_vector=>:no) fis.add_field(:value, :index=>:no, :store=>:yes, :term_vector=>:no) @writer=Ferret::Index::IndexWriter.new(:path=>@name, :field_infos=>fis, :create_if_needed=>true, :analyzer=>nil) end def close_writer @writer.close @writer=nil end def close @writer.close @writer=nil @name=nil end def destroy name=@name close `rm -r #{name}` nil end def path @name end def [](key) reader=Ferret::Index::IndexReader.new(@name) searcher=Ferret::Search::Searcher.new(reader) td=searcher.search(Ferret::Search::TermQuery.new(:key, key), :limit=>1) case td.total_hits when 0: when 1: result=reader[td.hits.first.doc][:value] else fail end searcher.close reader.close return result end def delete(key) reader=Ferret::Index::IndexReader.new(@name) searcher=Ferret::Search::Searcher.new(reader) td=searcher.search(Ferret::Search::TermQuery.new(:key, key), :limit=>1) case td.total_hits when 0: #do nothing when 1: close_writer docnum=td.hits.first.doc result=reader[docnum][:value] reader.delete docnum reader.commit else fail end searcher.close reader.close open_writer result end def []=(key,value) delete key @writer << {:key=>key, :value=>value} @writer.commit return value end def set_fast!(key, value) @writer << {:key=>key, :value=>value} end def sync @writer.commit end def keys reader=Ferret::Index::IndexReader.new(@name) result=reader.terms(:key).extend(Enumerable).map{|term,freq| freq==1 or fail term } reader.close return result end def values result=[] reader=Ferret::Index::IndexReader.new(@name) reader.max_doc.times{|n| result << reader[n][:value] unless reader.deleted? n } reader.close result end def each_key reader=Ferret::Index::IndexReader.new(@name) result=reader.terms(:key).extend(Enumerable).each{|term,freq| freq==1 or fail yield term } reader.close return self end def each each_key{|k| yield k,self[k] } end include Enumerable end if __FILE__==$0 fh=FerretHash.new keys=("a".."m").to_a vals=("n".."z").to_a keys.size.times{|i| fh[keys[i]]=vals[i] } keys.size.times{|i| fh[keys[i]]==vals[i] or fail } fh.keys.sort==keys or fail fh.values.sort==vals or fail fh["a"]="N" fh["a"]=="N" or fail fh.keys.sort==keys or fail fh.values.sort==["N"]+vals[1..-1] or fail fh.destroy end From wmorgan-ferret at masanjin.net Thu Mar 1 10:44:10 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Thu, 01 Mar 2007 07:44:10 -0800 Subject: [Ferret-talk] [ANN] Ferret 0.11.1-rc2 In-Reply-To: <45E58939.30602@videotron.ca> References: <91e51289bc18ec88ea70c74878699846@ruby-forum.com> <280d3574ed6d637d5cef710b4a82788c@ruby-forum.com> <0585b5ea7f025226d8bd2914979c7d64@ruby-forum.com> <45E58939.30602@videotron.ca> Message-ID: <1172763278-sup-835@south> Excerpts from Patrick Ritchie's message of Wed Feb 28 05:52:57 -0800 2007: > Have you looked at Hoe? It includes a Rake task to Package and upload > to Rubyforge. I use Hoe for several projects and it's great. The Rakefile for Sup (a complex package with 6 dependencies) is only 47 lines and has tasks for building the gem, uploading it RubyForge and posting announcments. The only issues I have are a) it expects a very specific directory setup, and if you are fitting it on a pre-existing project you either have to conform or you have to be very careful (e.g. it expects doc/ to contain only generated rdoc pages, and so deletes that directory with no warning for several tasks); and b) it adds itself as a dependency to the gem, for no good reason, and you have to monkey-patch it to stop that. -- William From my at email.com Thu Mar 1 14:18:32 2007 From: my at email.com (mix) Date: Thu, 1 Mar 2007 20:18:32 +0100 Subject: [Ferret-talk] ferret or not ferret? Message-ID: hi, i've to choose a search engine for a medium-big site with a lot of searches and inserts at the same moment, do you suggest me something? i'm thinking about ferret, but i read that it has some problems with this king of "work" :( -- Posted via http://www.ruby-forum.com/. From mischa78 at xs4all.nl Thu Mar 1 15:01:51 2007 From: mischa78 at xs4all.nl (Mischa Berger) Date: Thu, 1 Mar 2007 21:01:51 +0100 Subject: [Ferret-talk] How to use find_options in multi_search In-Reply-To: <20070301134208.GM10083@cordoba.webit.de> References: <20070301084013.GJ10083@cordoba.webit.de> <47d406bee702a6e15beb004a40ea141e@ruby-forum.com> <20070301134208.GM10083@cordoba.webit.de> Message-ID: <780ae0fef4f64c48af182a5e54e54fd5@ruby-forum.com> Thanks Jens, I really appreciate your help. > no, the find_options are only used when fetching the results via active > record. > > If you indexed the customer_id field you could just append > "+customer_id:#{normalize(@customer.id)}" to the ferret query string. I have a few more questions. At the moment I am indexing the customer_id too. Which one of these two option would be better performance-wise? > (normalize should be a function that pads the customer id to a fixed > length string) Why is it needed to pad the customer id to a fixed length string? I tried it without the padding and that seems to work. In case it's better to do the padding; should I find the id with the longest length and add spaces in front if the length of the customer_id I'm searching for is smaller? Thanks again! Mischa. -- http://boxroom.rubyforge.org -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Mar 2 04:02:20 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 2 Mar 2007 10:02:20 +0100 Subject: [Ferret-talk] How to use find_options in multi_search In-Reply-To: <780ae0fef4f64c48af182a5e54e54fd5@ruby-forum.com> References: <20070301084013.GJ10083@cordoba.webit.de> <47d406bee702a6e15beb004a40ea141e@ruby-forum.com> <20070301134208.GM10083@cordoba.webit.de> <780ae0fef4f64c48af182a5e54e54fd5@ruby-forum.com> Message-ID: <20070302090220.GB3652@cordoba.webit.de> On Thu, Mar 01, 2007 at 09:01:51PM +0100, Mischa Berger wrote: > Thanks Jens, > > I really appreciate your help. > > > no, the find_options are only used when fetching the results via active > > record. > > > > If you indexed the customer_id field you could just append > > "+customer_id:#{normalize(@customer.id)}" to the ferret query string. > > I have a few more questions. > > At the moment I am indexing the customer_id too. Which one of these two > option would be better performance-wise? If there is exactly one customer per document, having the customer_id indexed would be better. However the find_options are useful e.g. when you have to do joins to select records a client may see. > > (normalize should be a function that pads the customer id to a fixed > > length string) > > Why is it needed to pad the customer id to a fixed length string? I > tried it without the padding and that seems to work. In case it's better > to do the padding; should I find the id with the longest length and add > spaces in front if the length of the customer_id I'm searching for is > smaller? I'm not sure about the normalizing thing, possible you don't need it. But the field has to be untokenized. Afair the padding was only needed in earlier versions of ferret for sorting by fields with numeric contents - even there it's not needed any more. cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Fri Mar 2 04:07:54 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 2 Mar 2007 10:07:54 +0100 Subject: [Ferret-talk] ferret or not ferret? In-Reply-To: References: Message-ID: <20070302090754.GC3652@cordoba.webit.de> On Thu, Mar 01, 2007 at 08:18:32PM +0100, mix wrote: > hi, i've to choose a search engine for a medium-big site with a lot of > searches and inserts at the same moment, do you suggest me something? > i'm thinking about ferret, but i read that it has some problems with > this king of "work" :( Ferret recently had several improvements in this area (see Dave's recent posts about the recent release candidates). Even if you still should experience problems with multiple processes accessing the index you can always set up a simple DRb server doing the indexing/search work. Or you can have a look at acts_as_ferret, which has such a server already built in. Not to mention the fact that acts_as_ferret would make the integration of Ferret-based full text search into your app a one-liner :-) Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From dbalmain.ml at gmail.com Fri Mar 2 04:13:03 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 2 Mar 2007 20:13:03 +1100 Subject: [Ferret-talk] ferret or not ferret? In-Reply-To: References: Message-ID: On 3/2/07, mix wrote: > hi, i've to choose a search engine for a medium-big site with a lot of > searches and inserts at the same moment, do you suggest me something? > i'm thinking about ferret, but i read that it has some problems with > this king of "work" :( Ferret is getting better and better at this. The latest version still has a couple of bugs but the current working version is very stable with multiple processes accessing the index. I've just stress tested it with 10 search processes and 1 writer process for 24hours without any problems. I will definitely have this release out before Monday. I think the next version would be perfect for what you are talking about. solrb is also a good option although it will be a little slower and you'll have to run java on your server (not that this is a big deal). -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Fri Mar 2 06:01:45 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 2 Mar 2007 22:01:45 +1100 Subject: [Ferret-talk] FerretHash In-Reply-To: <45E6F02C.9020701@inforadical.net> References: <45E6F02C.9020701@inforadical.net> Message-ID: On 3/2/07, Caleb Clausen wrote: > Dave, thank you so much for the 0.11 release(s). You have solved many > problems for me. As part of my appreciation for your good works, I am > offering up for public consideration a silly little class that I wrote. > (Code is below.) This class offers a simplified Hash-like interface to > (a very restricted subset of) Ferret. Hence I call it FerretHash. > > FerretHash comes with its very own pet Ferret bug. Run the crude unit > test to see the problem. (Long story short, it looks like term > frequency, as reported by IndexReader#terms, does not take deletions > into account.) Hey Caleb, Unfortunately it would be too inefficient to change all of the term counts when you delete a document. What you can do is optimize the index before you iterate over the terms. For example; def keys @writer.optimize reader=Ferret::Index::IndexReader.new(@name) result=reader.terms(:key).extend(Enumerable).map{|term,freq| freq==1 or fail term } reader.close return result end Hope that makes sense. -- Dave Balmain http://www.davebalmain.com/ From my at email.com Fri Mar 2 07:44:45 2007 From: my at email.com (mix) Date: Fri, 2 Mar 2007 13:44:45 +0100 Subject: [Ferret-talk] ferret or not ferret? In-Reply-To: References: Message-ID: <0d32a6be8179ad9b1b3939af69342ee6@ruby-forum.com> > cut ok :) another question about ferret, is it possible to do 2 kind of search? normal (which include the text to search and another field) and advanced (which has more option to select, part or all of them) ? -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Mar 2 09:13:41 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 2 Mar 2007 15:13:41 +0100 Subject: [Ferret-talk] ferret or not ferret? In-Reply-To: <0d32a6be8179ad9b1b3939af69342ee6@ruby-forum.com> References: <0d32a6be8179ad9b1b3939af69342ee6@ruby-forum.com> Message-ID: <20070302141341.GB30297@cordoba.webit.de> On Fri, Mar 02, 2007 at 01:44:45PM +0100, mix wrote: > > cut > > ok :) > another question about ferret, is it possible to do 2 kind of search? > normal (which include the text to search and another field) and advanced > (which has more option to select, part or all of them) ? that's no problem at all, you can build very complex and field-specific queries as well as issuing a simple 'give me all docs where term xyz is in any field' query. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From erik at ehatchersolutions.com Fri Mar 2 06:03:18 2007 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Fri, 2 Mar 2007 06:03:18 -0500 Subject: [Ferret-talk] ferret or not ferret? In-Reply-To: References: Message-ID: On Mar 1, 2007, at 2:18 PM, mix wrote: > hi, i've to choose a search engine for a medium-big site with a lot of > searches and inserts at the same moment, do you suggest me something? > i'm thinking about ferret, but i read that it has some problems with > this king of "work" :( I was lurking on this thread until Dave mentioned solrb. First of all, I *love* Ferret. Dave is amazing, and the performance is fantastic. I have been groping for a Lucene in Ruby for a long time, even starting to tinker with it at a low-level pure Ruby way myself. When Solr came along I knew this hit the sweet spot I was looking for. It's all the greatness of Java Lucene, which is continually and rapidly being improved by many folks. Above and beyond just wrapping Lucene behind an HTTP interface, it adds a ton of great features on top: caching, replication, faceting, highlighting, and an incredibly active community. My expertise is in Java Lucene, so it felt right to me. We've started a project called solr-ruby (used to be named solrb, but we renamed it to be more readable and pronounceable) which provides a Ruby API to Solr. For example (from ): # connect to the solr instance conn = Connection.new('http://localhost:8983/solr', :autocommit => :on) # add a document to the index conn.add(:id => 123, :title_text => 'Lucene in Action') # update the document conn.update(:id => 123, :title_text => 'Solr in Action') # print out the first hit in a query for 'action' response = conn.query('action') print response.hits[0] # iterate through all the hits for 'action' conn.query('action') do |hit| puts hit.inspect end # delete document by id conn.delete(123) On top of solr-ruby, we've also been building Solr Flare, a Rails- based front-end that presents a faceted and full-text search interface, including integration with SIMILE Exhibit and Timeline, and eventually also having Atom feeds, saved searches, etc. While I certainly don't want to steal any thunder from Ferret, because I think it is a great project, I feel compelled on this thread to bring up what I consider a top-notch alternative to Ferret. It would be very interesting to run some benchmarks comparing the two at a few levels: indexing speed, plain full-text query speed, and also most important to my work, the speed of generating facet information along with a query. Erik From my at email.com Fri Mar 2 10:20:35 2007 From: my at email.com (mix) Date: Fri, 2 Mar 2007 16:20:35 +0100 Subject: [Ferret-talk] ferret or not ferret? In-Reply-To: <20070302141341.GB30297@cordoba.webit.de> References: <0d32a6be8179ad9b1b3939af69342ee6@ruby-forum.com> <20070302141341.GB30297@cordoba.webit.de> Message-ID: <5ef2ec4de5bb3ef7e15a95ecf8d9f8fc@ruby-forum.com> Jens Kraemer wrote: > On Fri, Mar 02, 2007 at 01:44:45PM +0100, mix wrote: >> > cut >> >> ok :) >> another question about ferret, is it possible to do 2 kind of search? >> normal (which include the text to search and another field) and advanced >> (which has more option to select, part or all of them) ? > > that's no problem at all, you can build very complex and field-specific > queries as well as issuing a simple 'give me all docs where term xyz is > in > any field' query. > > Jens > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen Malessa perfect, i think i'll go with ferret and act_as_ferret :) i've also found this: http://www.railsenvy.com/2007/2/19/acts-as-ferret-tutorial seems very good :) thanks -- Posted via http://www.ruby-forum.com/. From florent at solt.biz Fri Mar 2 10:25:16 2007 From: florent at solt.biz (Florent Solt) Date: Fri, 2 Mar 2007 16:25:16 +0100 Subject: [Ferret-talk] Sorted empty search bug Message-ID: Hello Dave, Hello all, I've got this error because I try to search something and sort it by name : Argument Error occured at :93 in xraise Error occured in sort.c:551 - field_cache_get_index Cannot sort by field "name". It doesn't exist in the index. The problem, occur when my index is empty, so the field "name" does not exists. -- Florent -- Posted via http://www.ruby-forum.com/. From wmorgan-ferret at masanjin.net Fri Mar 2 11:35:23 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Fri, 02 Mar 2007 08:35:23 -0800 Subject: [Ferret-talk] Sorted empty search bug In-Reply-To: References: Message-ID: <1172853261-sup-5623@south> Excerpts from Florent Solt's message of Fri Mar 02 07:25:16 -0800 2007: > Argument Error occured at :93 in xraise > Error occured in sort.c:551 - field_cache_get_index > Cannot sort by field "name". It doesn't exist in the index. > > The problem, occur when my index is empty, so the field "name" does not > exists. What version of Ferret are you using? I submitted a patch for this a few months ago that Dave committed, though I'm not sure to which version. -- William From caleb at inforadical.net Fri Mar 2 11:58:52 2007 From: caleb at inforadical.net (Caleb Clausen) Date: Fri, 02 Mar 2007 08:58:52 -0800 Subject: [Ferret-talk] FerretHash In-Reply-To: References: Message-ID: <45E857CC.1010804@inforadical.net> Dave Balmain wrote: > Unfortunately it would be too inefficient to change all of the term > counts when you delete a document. What you can do is optimize the > index before you iterate over the terms. Ok. So it's a known limitation. I didn't know that. From my at email.com Fri Mar 2 12:18:14 2007 From: my at email.com (mix) Date: Fri, 2 Mar 2007 18:18:14 +0100 Subject: [Ferret-talk] ferret or not ferret? In-Reply-To: <5ef2ec4de5bb3ef7e15a95ecf8d9f8fc@ruby-forum.com> References: <0d32a6be8179ad9b1b3939af69342ee6@ruby-forum.com> <20070302141341.GB30297@cordoba.webit.de> <5ef2ec4de5bb3ef7e15a95ecf8d9f8fc@ruby-forum.com> Message-ID: <856f886daaedfc376ee3afda126ea929@ruby-forum.com> just a last question :) for example, there is a book named "best of open source", if i search something like "source open" or "best source" or "source best" etc, ferret find them, isn't it? -- Posted via http://www.ruby-forum.com/. From florent at solt.biz Fri Mar 2 12:43:23 2007 From: florent at solt.biz (Florent Solt) Date: Fri, 2 Mar 2007 18:43:23 +0100 Subject: [Ferret-talk] Sorted empty search bug In-Reply-To: <1172853261-sup-5623@south> References: <1172853261-sup-5623@south> Message-ID: <4898b50988545b9494baded6be5fe760@ruby-forum.com> William Morgan wrote: > Excerpts from Florent Solt's message of Fri Mar 02 07:25:16 -0800 2007: >> Argument Error occured at :93 in xraise >> Error occured in sort.c:551 - field_cache_get_index >> Cannot sort by field "name". It doesn't exist in the index. >> >> The problem, occur when my index is empty, so the field "name" does not >> exists. > > What version of Ferret are you using? I submitted a patch for this a few > months ago that Dave committed, though I'm not sure to which version. I'm using 0.11.2. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Mar 2 12:50:25 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 2 Mar 2007 18:50:25 +0100 Subject: [Ferret-talk] ferret or not ferret? In-Reply-To: <856f886daaedfc376ee3afda126ea929@ruby-forum.com> References: <0d32a6be8179ad9b1b3939af69342ee6@ruby-forum.com> <20070302141341.GB30297@cordoba.webit.de> <5ef2ec4de5bb3ef7e15a95ecf8d9f8fc@ruby-forum.com> <856f886daaedfc376ee3afda126ea929@ruby-forum.com> Message-ID: <20070302175024.GD30297@cordoba.webit.de> On Fri, Mar 02, 2007 at 06:18:14PM +0100, mix wrote: > just a last question :) > for example, there is a book named "best of open source", if i search > something like "source open" or "best source" or "source best" etc, > ferret find them, isn't it? Usually it will. You can however construct queries that take the order of query terms into account, if you need that. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From my at email.com Fri Mar 2 12:59:11 2007 From: my at email.com (mix) Date: Fri, 2 Mar 2007 18:59:11 +0100 Subject: [Ferret-talk] ferret or not ferret? In-Reply-To: <20070302175024.GD30297@cordoba.webit.de> References: <0d32a6be8179ad9b1b3939af69342ee6@ruby-forum.com> <20070302141341.GB30297@cordoba.webit.de> <5ef2ec4de5bb3ef7e15a95ecf8d9f8fc@ruby-forum.com> <856f886daaedfc376ee3afda126ea929@ruby-forum.com> <20070302175024.GD30297@cordoba.webit.de> Message-ID: <2b906e00ac1f0043cb3a3ce2138445a7@ruby-forum.com> Jens Kraemer wrote: > > Usually it will. You can however construct queries that take the order > of query terms into account, if you need that. > > Jens > perfect :) ok ok, just the last one, and about the case sensitive? if i've a book "Open SOURCE", with a search "source" will it find it ? thanks :) -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Fri Mar 2 17:36:21 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 3 Mar 2007 09:36:21 +1100 Subject: [Ferret-talk] ferret or not ferret? In-Reply-To: <2b906e00ac1f0043cb3a3ce2138445a7@ruby-forum.com> References: <0d32a6be8179ad9b1b3939af69342ee6@ruby-forum.com> <20070302141341.GB30297@cordoba.webit.de> <5ef2ec4de5bb3ef7e15a95ecf8d9f8fc@ruby-forum.com> <856f886daaedfc376ee3afda126ea929@ruby-forum.com> <20070302175024.GD30297@cordoba.webit.de> <2b906e00ac1f0043cb3a3ce2138445a7@ruby-forum.com> Message-ID: On 3/3/07, mix wrote: > Jens Kraemer wrote: > > > > Usually it will. You can however construct queries that take the order > > of query terms into account, if you need that. > > > > Jens > > > > perfect :) ok ok, just the last one, and about the case sensitive? if > i've a book "Open SOURCE", with a search "source" will it find it ? > thanks :) Yes. You can do both case sensitive and case insensitive searches in Ferret depending on how you setup your analyzer but searches are case insensitive by default so a search for "source" will find "SOURCE". -- Dave Balmain http://www.davebalmain.com/ From no at spam.com Sat Mar 3 07:10:22 2007 From: no at spam.com (marco) Date: Sat, 3 Mar 2007 13:10:22 +0100 Subject: [Ferret-talk] ferret or not ferret? In-Reply-To: References: <0d32a6be8179ad9b1b3939af69342ee6@ruby-forum.com> <20070302141341.GB30297@cordoba.webit.de> <5ef2ec4de5bb3ef7e15a95ecf8d9f8fc@ruby-forum.com> <856f886daaedfc376ee3afda126ea929@ruby-forum.com> <20070302175024.GD30297@cordoba.webit.de> <2b906e00ac1f0043cb3a3ce2138445a7@ruby-forum.com> Message-ID: <7a4085e85c99db8ecc8a28eed42c47b7@ruby-forum.com> David Balmain wrote: > On 3/3/07, mix wrote: > Yes. You can do both case sensitive and case insensitive searches in > Ferret depending on how you setup your analyzer but searches are case > insensitive by default so a search for "source" will find "SOURCE". perfect :) just the last question, i promise :) with an index of 5-10gb how does it work? because i've to save some information in the index to use the highlight and do any query -- Posted via http://www.ruby-forum.com/. From no at spam.com Sat Mar 3 09:09:53 2007 From: no at spam.com (mix) Date: Sat, 3 Mar 2007 15:09:53 +0100 Subject: [Ferret-talk] Problem with ferret :( Message-ID: <13425fbfb1e16618db1c5df1c6a057bf@ruby-forum.com> hi, i'm trying ferret and acts_as_ferret, it's good, but i've a little problem. i've a model book which has a title and a quantity, how can i search using act_as_ferret all books which quantity is > 0 ? and howand with a search like "book", how can i found also title like "books" ? and the last, if i search "bok", is it possible to find anyway titles with "book"? (i saw that in ferret is it possible, but with acts_as_ferret?) thanks -- Posted via http://www.ruby-forum.com/. From no at spam.com Sat Mar 3 12:06:38 2007 From: no at spam.com (mix) Date: Sat, 3 Mar 2007 18:06:38 +0100 Subject: [Ferret-talk] Problem with ferret :( In-Reply-To: <13425fbfb1e16618db1c5df1c6a057bf@ruby-forum.com> References: <13425fbfb1e16618db1c5df1c6a057bf@ruby-forum.com> Message-ID: <3f231ef1fa64eb2cfb11b9b17af9a2f6@ruby-forum.com> mix wrote: > hi, i'm trying ferret and acts_as_ferret, it's good, but i've a little > problem. i've a model book which has a title and a quantity, how can i > search using act_as_ferret all books which quantity is > 0 ? and howand > with a search like "book", how can i found also title like "books" ? and > the last, if i search "bok", is it possible to find anyway titles with > "book"? (i saw that in ferret is it possible, but with acts_as_ferret?) > thanks i've another problem... :( how can i find only records which has "datetime < Time.now?" datetime is like updated_at, after this i've to sort the records by this datetime.... i've tryed for 3 hours but nothing :( thanks :( -- Posted via http://www.ruby-forum.com/. From JanPrill at blauton.de Sat Mar 3 17:37:17 2007 From: JanPrill at blauton.de (Jan Prill) Date: Sat, 3 Mar 2007 22:37:17 +0000 Subject: [Ferret-talk] Problem with ferret :( In-Reply-To: <3f231ef1fa64eb2cfb11b9b17af9a2f6@ruby-forum.com> References: <13425fbfb1e16618db1c5df1c6a057bf@ruby-forum.com> <3f231ef1fa64eb2cfb11b9b17af9a2f6@ruby-forum.com> Message-ID: <562a35c10703031437s14adeedel858cd014eb8cf24@mail.gmail.com> http://ferret.davebalmain.com/api/classes/Ferret/QueryParser.html Have you had a look at the api? Here's an excerpt from the above uri regarding range queries. Sorting of date-fields has been subject of discussion as well: http://www.ruby-forum.com/topic/84502 Cheers, Jan RangeQuery A range query finds all documents with terms between the two query terms. This can be very useful in particular for dates. eg; 'date:[20050725 20050905]' # all dates >= 20050725 and <= 20050905 'date:[20050725 20050905}' # all dates >= 20050725 and < 20050905 'date:{20050725 20050905]' # all dates > 20050725 and <= 20050905 'date:{20050725 20050905}' # all dates > 20050725 and < 20050905 You can also do open ended queries like this; 'date:[20050725>' # all dates >= 20050725 'date:{20050725>' # all dates > 20050725 'date:<20050905]' # all dates <= 20050905 'date:<20050905}' # all dates < 20050905 Or like this; 'date: >= 20050725' 'date: > 20050725' 'date: <= 20050905' 'date: < 20050905' If you prefer the above style you could use a boolean query but like this; 'date:( >= 20050725 AND <= 20050905)' But rangequery only solution shown first will be faster -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070303/daa67235/attachment-0001.html From chitam at gmail.com Sun Mar 4 02:59:26 2007 From: chitam at gmail.com (donut donut) Date: Sun, 4 Mar 2007 08:59:26 +0100 Subject: [Ferret-talk] [AAF] remote indexing via DRb with acts_as_ferret In-Reply-To: <20070204193455.GB29012@cordoba.webit.de> References: <20070204193455.GB29012@cordoba.webit.de> Message-ID: <9dfbb7b7672096f845c20c1b11370f57@ruby-forum.com> Jens Kraemer wrote: > Hi! > > Aaf trunk has undergone several major refactorings the last days, with > the result that you can now transparently switch your app from local > to remote indexing and back :-) > > If you plan to scale your app to more than one physical machine, or > if you have problems with corrupted indexes and the like under high > load, you really should give this a try. > > I wrote some documentation to get you started with the remote indexing > stuff at > http://projects.jkraemer.net/acts_as_ferret/wiki/DrbServer > > > Looking forward to your feedback, > Jens > > Hi, thanks for the great work! Does this work with Rails 1.1.6 as I haven't made the switch to 1.2.1 yet? -- Posted via http://www.ruby-forum.com/. From admin at mightytofu.com Sun Mar 4 08:02:19 2007 From: admin at mightytofu.com (Ted) Date: Sun, 4 Mar 2007 14:02:19 +0100 Subject: [Ferret-talk] Getting non-stemmed terms from IndexReader Message-ID: I need to get a set of terms being indexed using Ferret. I used IndexReader.terms and it returns a list of TermEnum nicely. The only problem is that my analyzer includes a stemming filter. So now, the terms I'm getting back are all stemmed. Is there anyway to get the original unstemmed terms back from the index somehow? Thanks. -- Posted via http://www.ruby-forum.com/. From no at spam.com Sun Mar 4 09:49:58 2007 From: no at spam.com (mix) Date: Sun, 4 Mar 2007 15:49:58 +0100 Subject: [Ferret-talk] Problem with ferret :( In-Reply-To: <562a35c10703031437s14adeedel858cd014eb8cf24@mail.gmail.com> References: <13425fbfb1e16618db1c5df1c6a057bf@ruby-forum.com> <3f231ef1fa64eb2cfb11b9b17af9a2f6@ruby-forum.com> <562a35c10703031437s14adeedel858cd014eb8cf24@mail.gmail.com> Message-ID: <393ef4265e617435cf0ed495352900a2@ruby-forum.com> Jan Prill wrote: > http://ferret.davebalmain.com/api/classes/Ferret/QueryParser.html > > Have you had a look at the api? Here's an excerpt from the above uri > regarding range queries. Sorting of date-fields has been subject of > discussion as well: http://www.ruby-forum.com/topic/84502 > Cheers, > Jan > yep, yesterday i looked it but it didn't work...anyway today i tried another time and now works, maybe the problem was another, btw, about date is ok, but there still are another problem, i've a string field which is empty or with a comment, how can i select all which has this comment null? i've tryed "comment:", "comment:''", "comment: " "comment:null", "comment:nil", "comment:=null", "comment:=nil", but nothing :( i've found a workaround like this: acts_as_ferret :fields => [:com def com self.comment == '' ? '0' : self.comment end def self.full_text_search(query) return nil if query.nil? or (query == '') query = "com:0" Model.find_by_contents(query, {:limit => :all}) end what do you think? just the last question... i've a model which has a has_many relationship, and i saw that when ferret retrieve the results does a query like this: SELECT * FROM something WHERE (something.id in ('33','22','6','11','23','12','24','13','8','25','14','9','26','15','27','16','28','17','29','31','20','19','21','32','10')) is it possible to do a join with another table to get also the information of the relationship? because in the result list i've to show also that informations, and with a join i'd have n query less :) -- Posted via http://www.ruby-forum.com/. From no at spam.com Sun Mar 4 10:24:07 2007 From: no at spam.com (mix) Date: Sun, 4 Mar 2007 16:24:07 +0100 Subject: [Ferret-talk] Problem with ferret :( In-Reply-To: <393ef4265e617435cf0ed495352900a2@ruby-forum.com> References: <13425fbfb1e16618db1c5df1c6a057bf@ruby-forum.com> <3f231ef1fa64eb2cfb11b9b17af9a2f6@ruby-forum.com> <562a35c10703031437s14adeedel858cd014eb8cf24@mail.gmail.com> <393ef4265e617435cf0ed495352900a2@ruby-forum.com> Message-ID: i solved the first with this: acts_as_ferret :fields => [:com def com self.comment == '' ? 'null' : 'comment' end def self.full_text_search(query) return nil if query.nil? or (query == '') query = "com:null" Model.find_by_contents(query, {:limit => :all}) end because i don't need to have the comment, so a simple 'comment' is ok... about the join how can i do ? :( -- Posted via http://www.ruby-forum.com/. From chad at zulu.net Sun Mar 4 10:51:51 2007 From: chad at zulu.net (Chad Thatcher) Date: Sun, 4 Mar 2007 16:51:51 +0100 Subject: [Ferret-talk] Need clarification of documentation Message-ID: <44337b07d5a83c4c3cec1e3a035eb3cf@ruby-forum.com> Hi, I have question about the delete() method docs. I am re-indexing data on the fly so I would like to delete any existing indexed data for a particular resource before re-indexing it using index.delete(id). The delete() method api doc says: "Delete the document referenced by the document number id if id is an integer or all of the documents which have the term id if id is a term.. id: The number of the document to delete" I am a little confused by what this means. At the time of deletion all I have is my own ID of the resource which was previously indexed in ferret with my own field :id. If I supply my own ID will the correct indexed data be deleted? Or does this ID refer to ferrets own internal ID for the resource? One other question while I am on the subject - will deleting a resource that does not exist raise an error. I ask this because I would like to index new data structures that haven't been indexed before and would like to avoid checking in the index first whether or not it exists before attempting to delete. Thanks, Chad. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Sun Mar 4 16:47:30 2007 From: kraemer at webit.de (Jens Kraemer) Date: Sun, 4 Mar 2007 22:47:30 +0100 Subject: [Ferret-talk] Problem with ferret :( In-Reply-To: References: <13425fbfb1e16618db1c5df1c6a057bf@ruby-forum.com> <3f231ef1fa64eb2cfb11b9b17af9a2f6@ruby-forum.com> <562a35c10703031437s14adeedel858cd014eb8cf24@mail.gmail.com> <393ef4265e617435cf0ed495352900a2@ruby-forum.com> Message-ID: <20070304214730.GA28769@cordoba.webit.de> On Sun, Mar 04, 2007 at 04:24:07PM +0100, mix wrote: > i solved the first with this: > > acts_as_ferret :fields => [:com > > def com > self.comment == '' ? 'null' : 'comment' > end > > def self.full_text_search(query) > return nil if query.nil? or (query == '') > query = "com:null" > Model.find_by_contents(query, {:limit => :all}) > end > > because i don't need to have the comment, so a simple 'comment' is ok... that shoukd work. however your above method completeley ignores the original query, which probably is not what you want. query << " com:null" is what you probably wanted to do... > about the join how can i do ? :( Model.find_by_contents(query, { :limit => :all }, { :include => [ :relationship ] } the third argument to find_by_contents is a hash of options as you would use if you selected your records with Model.find . Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Sun Mar 4 16:49:47 2007 From: kraemer at webit.de (Jens Kraemer) Date: Sun, 4 Mar 2007 22:49:47 +0100 Subject: [Ferret-talk] [AAF] remote indexing via DRb with acts_as_ferret In-Reply-To: <9dfbb7b7672096f845c20c1b11370f57@ruby-forum.com> References: <20070204193455.GB29012@cordoba.webit.de> <9dfbb7b7672096f845c20c1b11370f57@ruby-forum.com> Message-ID: <20070304214947.GB28769@cordoba.webit.de> On Sun, Mar 04, 2007 at 08:59:26AM +0100, donut donut wrote: > Jens Kraemer wrote: [..] > > > > I wrote some documentation to get you started with the remote indexing > > stuff at > > http://projects.jkraemer.net/acts_as_ferret/wiki/DrbServer > > > > Hi, thanks for the great work! Does this work with Rails 1.1.6 as I > haven't made the switch to 1.2.1 yet? I didn't test with 1.1.6 but it should work. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Sun Mar 4 16:57:16 2007 From: kraemer at webit.de (Jens Kraemer) Date: Sun, 4 Mar 2007 22:57:16 +0100 Subject: [Ferret-talk] ferret or not ferret? In-Reply-To: <7a4085e85c99db8ecc8a28eed42c47b7@ruby-forum.com> References: <0d32a6be8179ad9b1b3939af69342ee6@ruby-forum.com> <20070302141341.GB30297@cordoba.webit.de> <5ef2ec4de5bb3ef7e15a95ecf8d9f8fc@ruby-forum.com> <856f886daaedfc376ee3afda126ea929@ruby-forum.com> <20070302175024.GD30297@cordoba.webit.de> <2b906e00ac1f0043cb3a3ce2138445a7@ruby-forum.com> <7a4085e85c99db8ecc8a28eed42c47b7@ruby-forum.com> Message-ID: <20070304215716.GC28769@cordoba.webit.de> On Sat, Mar 03, 2007 at 01:10:22PM +0100, marco wrote: > David Balmain wrote: > > On 3/3/07, mix wrote: > > Yes. You can do both case sensitive and case insensitive searches in > > Ferret depending on how you setup your analyzer but searches are case > > insensitive by default so a search for "source" will find "SOURCE". > > perfect :) > just the last question, i promise :) > with an index of 5-10gb how does it work? because i've to save some > information in the index to use the highlight and do any query Try it out :-) I didn't use such a large index yet, but I think Ferret will be able to handle it just fine. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From no at spam.com Sun Mar 4 17:48:47 2007 From: no at spam.com (mix) Date: Sun, 4 Mar 2007 23:48:47 +0100 Subject: [Ferret-talk] Problem with ferret :( In-Reply-To: <20070304214730.GA28769@cordoba.webit.de> References: <13425fbfb1e16618db1c5df1c6a057bf@ruby-forum.com> <3f231ef1fa64eb2cfb11b9b17af9a2f6@ruby-forum.com> <562a35c10703031437s14adeedel858cd014eb8cf24@mail.gmail.com> <393ef4265e617435cf0ed495352900a2@ruby-forum.com> <20070304214730.GA28769@cordoba.webit.de> Message-ID: <2e7037ed8abdeb29db70ef3d57ad51c2@ruby-forum.com> Jens Kraemer wrote: > > that shoukd work. however your above method completeley ignores the > original query, which probably is not what you want. > query << " com:null" > is what you probably wanted to do... > ehm, yes... confused with = :) > Model.find_by_contents(query, { :limit => :all }, { :include => [ > :relationship ] } > thanks :) for now ferret is real cool, i've done the basic search :) -- Posted via http://www.ruby-forum.com/. From no at spam.com Sun Mar 4 17:49:55 2007 From: no at spam.com (mix) Date: Sun, 4 Mar 2007 23:49:55 +0100 Subject: [Ferret-talk] ferret or not ferret? In-Reply-To: <20070304215716.GC28769@cordoba.webit.de> References: <0d32a6be8179ad9b1b3939af69342ee6@ruby-forum.com> <20070302141341.GB30297@cordoba.webit.de> <5ef2ec4de5bb3ef7e15a95ecf8d9f8fc@ruby-forum.com> <856f886daaedfc376ee3afda126ea929@ruby-forum.com> <20070302175024.GD30297@cordoba.webit.de> <2b906e00ac1f0043cb3a3ce2138445a7@ruby-forum.com> <7a4085e85c99db8ecc8a28eed42c47b7@ruby-forum.com> <20070304215716.GC28769@cordoba.webit.de> Message-ID: <11e94f58d7aadb822c512383055f8f79@ruby-forum.com> Jens Kraemer wrote: > Try it out :-) I didn't use such a large index yet, but I think Ferret > will be able to handle it just fine. > i hope to achieve that dimension :) -- Posted via http://www.ruby-forum.com/. From wmorgan-ferret at masanjin.net Sun Mar 4 18:02:21 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Sun, 04 Mar 2007 15:02:21 -0800 Subject: [Ferret-talk] Sorted empty search bug In-Reply-To: <4898b50988545b9494baded6be5fe760@ruby-forum.com> References: <1172853261-sup-5623@south> <4898b50988545b9494baded6be5fe760@ruby-forum.com> Message-ID: <1173049161-sup-2661@south> Excerpts from Florent Solt's message of Fri Mar 02 09:43:23 -0800 2007: > >> The problem, occur when my index is empty, so the field "name" does not > >> exists. > > > > What version of Ferret are you using? I submitted a patch for this a few > > months ago that Dave committed, though I'm not sure to which version. > > I'm using 0.11.2. Ok, I'm actually thinking of a patch for a similar, but different issue. In this case Ferret's behavior actually strikes me as correct. You haven't defined the field, right? Does the error still occur with an empty index but where the field has been defined, e.g. via FieldInfos#create_index? -- William From matt at mattschnitz.com Sun Mar 4 21:15:58 2007 From: matt at mattschnitz.com (Matt Schnitz) Date: Sun, 4 Mar 2007 18:15:58 -0800 Subject: [Ferret-talk] Is indexing slower? Message-ID: <497cc4a0703041815h556563dbi94f95414786f49c5@mail.gmail.com> Hi - I upgraded to Ferret 0.11.3 from 0.10.13. I used to index 10,000 records in 10 secs. Now it takes 13 minutes. (That's a factor of ~75x) Did something change in the flush semantics, or something? Thanks! Schnitz -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070304/5fa79f23/attachment-0001.html From Neville.Burnell at bmsoft.com.au Sun Mar 4 23:06:37 2007 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Mon, 5 Mar 2007 15:06:37 +1100 Subject: [Ferret-talk] Warming up a new Searcher/Reader (Ferret 0.10.9 win32) Message-ID: <126EC586577FD611A28E00A0C9A03758B5C5A2@maui.bmsoft.com.au> Hi, I have a largish index [700MB] which is updated from time to time, requiring me to close and recreate the Ferret::Search::Searcher to use the latest index. My problem is that the first few searches on the new index are slow [by comparison to before the close/recreate], I'm guessing because the new index is being loaded into RAM by my OS and into Ferret as needed. I'm thinking of "warming up" the OS and Ferret by processing a query or two, and I'd appreciate any details about the way Ferret loads the index, so that I can construct a good "warm up" query. Kind Regards Neville -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070305/13b4bf16/attachment.html From adam.thorsen at gmail.com Sun Mar 4 23:13:24 2007 From: adam.thorsen at gmail.com (Adam Thorsen) Date: Mon, 5 Mar 2007 05:13:24 +0100 Subject: [Ferret-talk] ferret finds 'tests' but not 'test' In-Reply-To: <99502401-110C-40D0-8B23-918A040EA6E3@gmx.net> References: <94cbc17ff76e8950daeea9a13b10afd6@ruby-forum.com> <490ff92ace22fc678e620105f75bc5b3@ruby-forum.com> <6EABC590-396E-4CB6-A289-56E7D4CB970B@gmx.net> <99502401-110C-40D0-8B23-918A040EA6E3@gmx.net> Message-ID: Andreas Korth wrote: > Hi Caspar, > > On 27.10.2006, at 11:58, Ghost wrote: > >> NameError: uninitialized constant MyAnalyzer > Sorry, I forgot to mention that the directory structure needs to > resemble the module nesting, i.e. the file must go in app/models/ > ferret/analysis instead of just app/models. > > Cheers, > Andy I've been trying to use the solution for stemming discussed in this thread and have run into a bit of trouble. I'm using this analyzer: module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end I've configured aaf thusly: AAF_DEFAULT_FERRET_OPTIONS = {:analyzer => Ferret::Analysis::StemmingAnalyzer.new} acts_as_ferret({:store_class_name => true, :fields => {:description => {:store => :yes}}}.merge(AAF_DEFAULT_OPTIONS), AAF_DEFAULT_FERRET_OPTIONS) The first time I search for something a new index is created in index, and it successfully returns a set of results. The second time I search, however, I get a strange error: uninitialized constant Ferret::Search #{RAILS_ROOT}/vendor/rails/activesupport/lib/active_support/dependencies.rb:264:in `load_missing_constant' #{RAILS_ROOT}/vendor/rails/activesupport/lib/active_support/dependencies.rb:453:in `const_missing' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/local_index.rb:160:in `query_for_record' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/local_index.rb:152:in `document_number' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/local_index.rb:135:in `highlight' /opt/local/lib/ruby/1.8/monitor.rb:238:in `synchronize' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/local_index.rb:134:in `highlight' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/instance_methods.rb:30:in `highlight' Perhaps it has something to do with loading an already created index? Thanks, -Adam -- Posted via http://www.ruby-forum.com/. From jgelens at gmail.com Mon Mar 5 02:52:08 2007 From: jgelens at gmail.com (Jeffrey Gelens) Date: Mon, 5 Mar 2007 08:52:08 +0100 Subject: [Ferret-talk] Problem with large index file In-Reply-To: References: Message-ID: <76885f200ff968a2fc0a3eef8c8f8a68@ruby-forum.com> I recreated the index with this option :max_merge_docs => 100000 and it seems to work great. -- Posted via http://www.ruby-forum.com/. From henke at mac.se Mon Mar 5 04:14:41 2007 From: henke at mac.se (Henrik Zagerholm) Date: Mon, 5 Mar 2007 10:14:41 +0100 Subject: [Ferret-talk] Using act_as_ferret with find_by_sql Message-ID: Hello, I wonder if its possible to combine ferret queries with find_by_sql queries? Or should I try to rewrite my query using find and then use find_by_content when I'm done? Thanks for a great product! Regards, henrik From adam.thorsen at gmail.com Mon Mar 5 11:25:50 2007 From: adam.thorsen at gmail.com (Adam Thorsen) Date: Mon, 5 Mar 2007 17:25:50 +0100 Subject: [Ferret-talk] programatically stopping acts_as_ferret drb server Message-ID: I need a way to kill the ferret_server drb process programatically, so I can start/stop it as part of the capistrano deployment process. This should be as simple as adding some sort of stop method to ActsAsFerret::Remote::Server. I was just messing around and was able to do it by modifying method_missing to look for the :stop method and then calling DRb.thread.exit -- this is not good enough for a general solution however. If anyone has an idea of how it should be done, I can do it and submit a patch. Thanks, -Adam -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Mar 5 11:53:27 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 5 Mar 2007 17:53:27 +0100 Subject: [Ferret-talk] programatically stopping acts_as_ferret drb server In-Reply-To: References: Message-ID: <20070305165327.GE25625@cordoba.webit.de> Hi! On Mon, Mar 05, 2007 at 05:25:50PM +0100, Adam Thorsen wrote: > I need a way to kill the ferret_server drb process programatically, so I > can start/stop it as part of the capistrano deployment process. > > This should be as simple as adding some sort of stop method to > ActsAsFerret::Remote::Server. I was just messing around and was able to > do it by modifying method_missing to look for the :stop method and then > calling DRb.thread.exit -- this is not good enough for a general > solution however. > > If anyone has an idea of how it should be done, I can do it and submit a > patch. I could imagine a set of start/stop scripts where the start script launched the server as a daemon and recorded it's pid somewhere. stop the only had to read that pid and kill the process. Those scripts could then be easily be called from cap recipes. Something like Daemonize (http://grub.ath.cx/daemonize/) might come in handy. I'd really appreciate if you could tackle that subject :-) cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From cwaters at networkchemistry.com Mon Mar 5 12:14:33 2007 From: cwaters at networkchemistry.com (Waters, Chris) Date: Mon, 5 Mar 2007 12:14:33 -0500 Subject: [Ferret-talk] Cost of using many fields Message-ID: <8D2B87EFB35A4341B53B86D3DE6F4CD501481E03@mse1be1.mse1.mailstreet.com> Hi, In ferret, and especially when using acts_as_ferret, it is easy to specify many fields. What is the cost of using a lot of fields from a performance perspective? Is each field searched separately, or are they combined together in the inverted index. As an extreme example, if I made every word in my documents a separate field (so the first word in each document was field 1 and the second word was field 2, etc) would this be significantly less efficient than treating the entire document as a single field? I am not doing something quite as bad as this hypothetical example, but I am investigating different ways to organize some data. Thanks, Chris. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070305/481b9f87/attachment.html From joestelmach at gmail.com Mon Mar 5 16:29:37 2007 From: joestelmach at gmail.com (Joe Stelmach) Date: Mon, 5 Mar 2007 22:29:37 +0100 Subject: [Ferret-talk] Lucene index compatibility Message-ID: <05b3acd9fed1492c88fc13d24c41f1dd@ruby-forum.com> I would like to generate a Lucene index in Java (of plain text values only,) and be able to use that index in ferret. I've seen many mixed answers to this question, so I'm hoping some of you can help. Thanks! -- Posted via http://www.ruby-forum.com/. From adam.thorsen at gmail.com Mon Mar 5 20:05:20 2007 From: adam.thorsen at gmail.com (Adam Thorsen) Date: Tue, 6 Mar 2007 02:05:20 +0100 Subject: [Ferret-talk] programatically stopping acts_as_ferret drb server In-Reply-To: <20070305165327.GE25625@cordoba.webit.de> References: <20070305165327.GE25625@cordoba.webit.de> Message-ID: Ok I've put together a couple of scripts that are basically a hodgepodge of daemonize, mongrel, and aaf code that start and stop ferret_server. The start script looks for the pid_file value in ferret_server.yml, then creates a pid file in config based on that value, forks, end exits. The stop script looks for the pid_file in the same place and sends the TERM signal. Upon receiving the TERM signal, script that launched ferret_server removes the pid file and exits. Here is the start script: #!/usr/bin/env script/runner config = YAML.load(ERB.new(IO.read("#{RAILS_ROOT}/config/ferret_server.yml")).result)[RAILS_ENV] @pid_file = "#{RAILS_ROOT}/#{config['pid_file']}" def write_pid_file open(@pid_file,"w") {|f| f.write(Process.pid) } end def safefork tryagain = true while tryagain tryagain = false begin if pid = fork return pid end rescue Errno::EWOULDBLOCK sleep 5 tryagain = true end end end safefork and exit write_pid_file at_exit do File.unlink(@pid_file) if @pid_file and File.exists?(@pid_file) end puts "Starting ferret_server..." trap("TERM") { exit(0) } sess_id = Process.setsid STDIN.reopen "/dev/null" # Free file descriptors and STDOUT.reopen "/dev/null", "a" # point them somewhere sensible STDERR.reopen STDOUT # STDOUT/STDERR should go to a logfile ActsAsFerret::Remote::Server.start DRb.thread.join Here is the Stop Script: #!/usr/bin/env script/runner config = YAML.load(ERB.new(IO.read("#{RAILS_ROOT}/config/ferret_server.yml")).result)[RAILS_ENV] def send_signal(signal, pid_file) pid = open(pid_file).read.to_i print "Sending #{signal} to ferret_server at PID #{pid}..." begin Process.kill(signal, pid) rescue Errno::ESRCH puts "Process does not exist. Not running." end puts "Done." end pid_file = config['pid_file'] puts "Stopping ferret_server..." send_signal("TERM", pid_file) Haven't really tested them very much but they seem to be working. Jens Kraemer wrote: > Hi! > > On Mon, Mar 05, 2007 at 05:25:50PM +0100, Adam Thorsen wrote: >> patch. > I could imagine a set of start/stop scripts where the start script > launched the server as a daemon and recorded it's pid somewhere. stop > the only had to read that pid and kill the process. Those scripts could > then be easily be called from cap recipes. > > Something like Daemonize (http://grub.ath.cx/daemonize/) might come in > handy. > > I'd really appreciate if you could tackle that subject :-) > > cheers, > Jens > > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen Malessa -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Mar 5 21:13:38 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 6 Mar 2007 13:13:38 +1100 Subject: [Ferret-talk] Getting non-stemmed terms from IndexReader In-Reply-To: References: Message-ID: On 3/5/07, Ted wrote: > I need to get a set of terms being indexed using Ferret. I used > IndexReader.terms and it returns a list of TermEnum nicely. The only > problem is that my analyzer includes a stemming filter. > So now, the terms I'm getting back are all stemmed. Is there anyway to > get the original unstemmed terms back from the index somehow? Thanks. Hi Ted, Unfortunately this isn't really possible. What I'd recommend is indexing the field twice; once with a stemming analyzer and once without. See PerFieldAnalyzer; http://ferret.davebalmain.com/api/classes/Ferret/Analysis/PerFieldAnalyzer.html Hope that helps. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Mon Mar 5 21:41:48 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 6 Mar 2007 13:41:48 +1100 Subject: [Ferret-talk] Need clarification of documentation In-Reply-To: <44337b07d5a83c4c3cec1e3a035eb3cf@ruby-forum.com> References: <44337b07d5a83c4c3cec1e3a035eb3cf@ruby-forum.com> Message-ID: On 3/5/07, Chad Thatcher wrote: > > Hi, I have question about the delete() method docs. > > I am re-indexing data on the fly so I would like to delete any existing > indexed data for a particular resource before re-indexing it using > index.delete(id). > > The delete() method api doc says: > > "Delete the document referenced by the document number id if id is an > integer or all of the documents which have the term id if id is a term.. > > id: The number of the document to delete" > > I am a little confused by what this means. Is this any clearer? # Deletes a document/documents from the index. The method for determining # the document to delete depends on the type of the argument passed. # # If +arg+ is an Integer then delete the document based on the internal # document number. # # If +arg+ is a String then search for the documents with +arg+ in the # +id+ field. The +id+ field is either :id or whatever you set the :id_field # parameter to when you create the Index object. > At the time of deletion all > I have is my own ID of the resource which was previously indexed in > ferret with my own field :id. If I supply my own ID will the correct > indexed data be deleted? Or does this ID refer to ferrets own internal > ID for the resource? In this case, since your id is probably an integer you will need to convert it to a string or Ferret will delete the documents by internal document number rather than your own ID for the resource. > One other question while I am on the subject - will deleting a resource > that does not exist raise an error. I ask this because I would like to > index new data structures that haven't been indexed before and would > like to avoid checking in the index first whether or not it exists > before attempting to delete. Yes, if you delete by internal document number. No, if you are deleting by term, ie passing your own document id which is stored in the *id* field. So in your case you should be fine. I should also mention that you can set the :key parameter to :id; index = Ferret::Index::Index.new(:key => :id) This way, whenever you add a document with an id that already exists in the index it will replace the existing document. For example; require 'rubygems' require 'ferret' index = Ferret::I.new(:key => :id) [ {:id => '1', :text => 'one'}, {:id => '2', :text => 'Two'}, {:id => '3', :text => 'Three'}, {:id => '1', :text => 'One'} ].each {|doc| index << doc} puts index.size # => 3 puts index['1'].load.inspect # => {:text=>"One", :id=>"1"} puts index.search('id:1').to_s(:text) # => TopDocs: total_hits = 1, max_score = 1.287682 [ # 3 "One": 1.287682 # ] Hope that helps, Dave -- Dave Balmain http://www.davebalmain.com/ From admin at mightytofu.com Mon Mar 5 21:46:09 2007 From: admin at mightytofu.com (Ted) Date: Tue, 6 Mar 2007 03:46:09 +0100 Subject: [Ferret-talk] Getting non-stemmed terms from IndexReader In-Reply-To: References: Message-ID: Thanks for the response. This is exactly what I did... indexing the field twice and then have different analyzers for both. David Balmain wrote: > On 3/5/07, Ted wrote: >> I need to get a set of terms being indexed using Ferret. I used >> IndexReader.terms and it returns a list of TermEnum nicely. The only >> problem is that my analyzer includes a stemming filter. >> So now, the terms I'm getting back are all stemmed. Is there anyway to >> get the original unstemmed terms back from the index somehow? Thanks. > > Hi Ted, > > Unfortunately this isn't really possible. What I'd recommend is > indexing the field twice; once with a stemming analyzer and once > without. See PerFieldAnalyzer; > > http://ferret.davebalmain.com/api/classes/Ferret/Analysis/PerFieldAnalyzer.html > > Hope that helps. > > Cheers, > Dave -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Mar 5 21:49:05 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 6 Mar 2007 13:49:05 +1100 Subject: [Ferret-talk] Sorted empty search bug In-Reply-To: <1173049161-sup-2661@south> References: <1172853261-sup-5623@south> <4898b50988545b9494baded6be5fe760@ruby-forum.com> <1173049161-sup-2661@south> Message-ID: On 3/5/07, William Morgan wrote: > Excerpts from Florent Solt's message of Fri Mar 02 09:43:23 -0800 2007: > > >> The problem, occur when my index is empty, so the field "name" does not > > >> exists. > > > > > > What version of Ferret are you using? I submitted a patch for this a few > > > months ago that Dave committed, though I'm not sure to which version. > > > > I'm using 0.11.2. > > Ok, I'm actually thinking of a patch for a similar, but different issue. > > In this case Ferret's behavior actually strikes me as correct. You > haven't defined the field, right? Does the error still occur with an > empty index but where the field has been defined, e.g. via > FieldInfos#create_index? Hi Florent, I agree with William here. I think the behavior is correct as it is impossible to sort by a field which doesn't exist. However, it was an extra line of code to make searches return an empty result set no matter what when the index is empty so I added it. You should no longer be getting a exception in this situation in 0.11.3. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From admin at mightytofu.com Mon Mar 5 21:58:39 2007 From: admin at mightytofu.com (Ted) Date: Tue, 6 Mar 2007 03:58:39 +0100 Subject: [Ferret-talk] Getting non-stemmed terms from IndexReader In-Reply-To: References: Message-ID: I encountered another problem: After I removed docs from the index, the doc_freq returned by IndexReader.terms is not updated. It always shows the old number or bigger number after more docs with that term is added. So it looks like the doc_freq is not updated corrected on removal of a doc. David Balmain wrote: > On 3/5/07, Ted wrote: >> I need to get a set of terms being indexed using Ferret. I used >> IndexReader.terms and it returns a list of TermEnum nicely. The only >> problem is that my analyzer includes a stemming filter. >> So now, the terms I'm getting back are all stemmed. Is there anyway to >> get the original unstemmed terms back from the index somehow? Thanks. > > Hi Ted, > > Unfortunately this isn't really possible. What I'd recommend is > indexing the field twice; once with a stemming analyzer and once > without. See PerFieldAnalyzer; > > http://ferret.davebalmain.com/api/classes/Ferret/Analysis/PerFieldAnalyzer.html > > Hope that helps. > > Cheers, > Dave -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Mar 5 22:03:07 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 6 Mar 2007 14:03:07 +1100 Subject: [Ferret-talk] Is indexing slower? In-Reply-To: <497cc4a0703041815h556563dbi94f95414786f49c5@mail.gmail.com> References: <497cc4a0703041815h556563dbi94f95414786f49c5@mail.gmail.com> Message-ID: On 3/5/07, Matt Schnitz wrote: > Hi - I upgraded to Ferret 0.11.3 from 0.10.13. > > I used to index 10,000 records in 10 secs. Now it takes 13 minutes. > (That's a factor of ~75x) > > Did something change in the flush semantics, or something? Hi Matt, The opening of an index takes a little longer now. I guess if you have the index set to :auto_flush then it could take a fair bit longer but I didn't expect it to take that much longer. Unfortunately this slowdown was a price I had to pay to prevent the segfault and FileNotFound errors that people where getting. Having said that, you shouldn't have :auto_flush set when you are batch indexing anyway. If you send me a benchmark which approximates what you are doing, I'd be happy to take a look at it for you and tell you how to make it faster or add a fix to Ferret if the problem does happen to be at this end. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Mon Mar 5 22:15:27 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 6 Mar 2007 14:15:27 +1100 Subject: [Ferret-talk] Problem with large index file In-Reply-To: <76885f200ff968a2fc0a3eef8c8f8a68@ruby-forum.com> References: <76885f200ff968a2fc0a3eef8c8f8a68@ruby-forum.com> Message-ID: Hi Jeffrey, That's great to hear. If you have a chance, could you try copying the index (cp -r) and then opening the copy and optimizing it. Then let me know if you are still getting the same problem you were getting before. I understand if this is too much trouble. 5Gb is a lot of data to be playing around with. Cheers, Dave On 3/5/07, Jeffrey Gelens wrote: > I recreated the index with this option :max_merge_docs => 100000 and it > seems to work great. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Mon Mar 5 22:21:59 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 6 Mar 2007 14:21:59 +1100 Subject: [Ferret-talk] Cost of using many fields In-Reply-To: <8D2B87EFB35A4341B53B86D3DE6F4CD501481E03@mse1be1.mse1.mailstreet.com> References: <8D2B87EFB35A4341B53B86D3DE6F4CD501481E03@mse1be1.mse1.mailstreet.com> Message-ID: On 3/6/07, Waters, Chris wrote: > In ferret, and especially when using acts_as_ferret, it is easy to specify > many fields. What is the cost of using a lot of fields from a performance > perspective? Is each field searched separately, or are they combined > together in the inverted index. Hi Chris, Each field is searched separately so the more fields you search the longer the search will take. Also note that there shouldn't be any difference in the time to search a single field whether you have 1 field or 1 million. It will only take longer if you search all 1 million fields. > As an extreme example, if I made every word in my documents a separate field > (so the first word in each document was field 1 and the second word was > field 2, etc) would this be significantly less efficient than treating the > entire document as a single field? > > > > I am not doing something quite as bad as this hypothetical example, but I am > investigating different ways to organize some data. I'm not sure exactly what you want to do but you may want to look at span queries. These queries allow you to search based on the positions of the terms in the document. But perhaps your hypothetical is misleading me. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Mon Mar 5 22:25:37 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 6 Mar 2007 14:25:37 +1100 Subject: [Ferret-talk] Getting non-stemmed terms from IndexReader In-Reply-To: References: Message-ID: On 3/6/07, Ted wrote: > I encountered another problem: > > After I removed docs from the index, the doc_freq returned by > IndexReader.terms is not updated. It always shows the old number or > bigger number after more docs with that term is added. > So it looks like the doc_freq is not updated corrected on removal of a > doc. This is impossible to fix without ruining performance. To fix this problem I would basically need to optimize the index after every deletion. In fact, you can do this yourself if you like. Just optimize the index whenever you need to rely on the doc frequency being correct and you have possible deletions in the index. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Mon Mar 5 22:32:36 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 6 Mar 2007 14:32:36 +1100 Subject: [Ferret-talk] Warming up a new Searcher/Reader (Ferret 0.10.9 win32) In-Reply-To: <126EC586577FD611A28E00A0C9A03758B5C5A2@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A03758B5C5A2@maui.bmsoft.com.au> Message-ID: On 3/5/07, Neville Burnell wrote: > > > > Hi, > > I have a largish index [700MB] which is updated from time to time, > requiring me to close and recreate the Ferret::Search::Searcher to use the > latest index. > > My problem is that the first few searches on the new index are slow [by > comparison to before the close/recreate], I'm guessing because the new index > is being loaded into RAM by my OS and into Ferret as needed. > > I'm thinking of "warming up" the OS and Ferret by processing a query or two, > and I'd appreciate any details about the way Ferret loads the index, so that > I can construct a good "warm up" query. Hi Neville, Ferret loads the index for each field when it is searched. So if you only search one field, only that field's index will be loaded. Once each fields index is loaded, ferret should be fully warmed up as far as simple queries like phrase and boolean go. If you are sorting search results, then sort indexes will also need to be built. A new sort index is built for each sorted field depending on sort type (int, float, string, byte) and sort direction (normal and reverse). So if you are sorting your search results you will also need to try each type of sort that you might use to warm up that part of the index. That's all I can think of at the moment. One thing I have planned for the future is adding the ability to autoload all the indexes and to save sort indexes rather than building them each time you open an index reader. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Mon Mar 5 22:36:20 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 6 Mar 2007 14:36:20 +1100 Subject: [Ferret-talk] Lucene index compatibility In-Reply-To: <05b3acd9fed1492c88fc13d24c41f1dd@ruby-forum.com> References: <05b3acd9fed1492c88fc13d24c41f1dd@ruby-forum.com> Message-ID: On 3/6/07, Joe Stelmach wrote: > I would like to generate a Lucene index in Java (of plain text values > only,) and be able to use that index in ferret. I've seen many mixed > answers to this question, so I'm hoping some of you can help. > > Thanks! Hi Joe, Firstly, why do you want to generate the index with Java. I just want to make sure that performance isn't the reason, because you'll probably get better performance with Ferret. If there is another reason you want to use Java to build the index then unfortunately there is no way to read the index with Ferret. Ferret now uses a very different index file format. But all is not lost. Look at solr-ruby. This will allow you to use your Lucene indexes from Ruby. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From Neville.Burnell at bmsoft.com.au Mon Mar 5 22:52:38 2007 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Tue, 6 Mar 2007 14:52:38 +1100 Subject: [Ferret-talk] Warming up a new Searcher/Reader (Ferret 0.10.9win32) References: <126EC586577FD611A28E00A0C9A03758B5C5A2@maui.bmsoft.com.au> Message-ID: <126EC586577FD611A28E00A0C9A03758B5C5BE@maui.bmsoft.com.au> Thanks for the detail Dave! > One thing I have planned for the future is adding the ability > to autoload all the indexes Awesome - this sounds like just what I'm looking for. Cheers, Nev > -----Original Message----- > From: ferret-talk-bounces at rubyforge.org [mailto:ferret-talk- > bounces at rubyforge.org] On Behalf Of David Balmain > Sent: Tuesday, 6 March 2007 2:33 PM > To: ferret-talk at rubyforge.org > Subject: Re: [Ferret-talk] Warming up a new Searcher/Reader (Ferret > 0.10.9win32) > > On 3/5/07, Neville Burnell wrote: > > > > > > > > Hi, > > > > I have a largish index [700MB] which is updated from time to time, > > requiring me to close and recreate the Ferret::Search::Searcher to > use the > > latest index. > > > > My problem is that the first few searches on the new index are slow > [by > > comparison to before the close/recreate], I'm guessing because the > new index > > is being loaded into RAM by my OS and into Ferret as needed. > > > > I'm thinking of "warming up" the OS and Ferret by processing a query > or two, > > and I'd appreciate any details about the way Ferret loads the index, > so that > > I can construct a good "warm up" query. > > Hi Neville, > > Ferret loads the index for each field when it is searched. So if you > only search one field, only that field's index will be loaded. > > Once each fields index is loaded, ferret should be fully warmed up as > far as simple queries like phrase and boolean go. If you are sorting > search results, then sort indexes will also need to be built. A new > sort index is built for each sorted field depending on sort type (int, > float, string, byte) and sort direction (normal and reverse). So if > you are sorting your search results you will also need to try each > type of sort that you might use to warm up that part of the index. > > That's all I can think of at the moment. One thing I have planned for > the future is adding the ability to autoload all the indexes and to > save sort indexes rather than building them each time you open an > index reader. > > Cheers, > Dave > > -- > Dave Balmain > http://www.davebalmain.com/ > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From matt at mattschnitz.com Mon Mar 5 23:05:52 2007 From: matt at mattschnitz.com (Matt Schnitz) Date: Mon, 5 Mar 2007 20:05:52 -0800 Subject: [Ferret-talk] Is indexing slower? In-Reply-To: References: <497cc4a0703041815h556563dbi94f95414786f49c5@mail.gmail.com> Message-ID: <497cc4a0703052005o3d1dbf99i3990a340c3ce0096@mail.gmail.com> Figured it out. It's interesting, academically. I was flushing every time I added something to the index. I forget exactly why I thought that was a good idea, but that's what I was doing. Apparently, that's a bad idea under 0.11. Worked fine under 0.10, not so fine under 0.11. Like, 75x less fine. Now I'm just flushing after every batch. It's back to reasonable, 5-seconds-for-10,000-small-records performance. Thanks for all your help, Dave. Schnitz On 3/5/07, David Balmain wrote: > > On 3/5/07, Matt Schnitz wrote: > > Hi - I upgraded to Ferret 0.11.3 from 0.10.13. > > > > I used to index 10,000 records in 10 secs. Now it takes 13 minutes. > > (That's a factor of ~75x) > > > > Did something change in the flush semantics, or something? > > Hi Matt, > > The opening of an index takes a little longer now. I guess if you have > the index set to :auto_flush then it could take a fair bit longer but > I didn't expect it to take that much longer. Unfortunately this > slowdown was a price I had to pay to prevent the segfault and > FileNotFound errors that people where getting. Having said that, you > shouldn't have :auto_flush set when you are batch indexing anyway. > > If you send me a benchmark which approximates what you are doing, I'd > be happy to take a look at it for you and tell you how to make it > faster or add a fix to Ferret if the problem does happen to be at this > end. > > Cheers, > Dave > > -- > Dave Balmain > http://www.davebalmain.com/ > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070305/7e9e0574/attachment.html From cwaters at networkchemistry.com Mon Mar 5 23:13:45 2007 From: cwaters at networkchemistry.com (Waters, Chris) Date: Mon, 5 Mar 2007 23:13:45 -0500 Subject: [Ferret-talk] Cost of using many fields In-Reply-To: References: <8D2B87EFB35A4341B53B86D3DE6F4CD501481E03@mse1be1.mse1.mailstreet.com> Message-ID: <8D2B87EFB35A4341B53B86D3DE6F4CD5014FC24C@mse1be1.mse1.mailstreet.com> Thanks, that answers my question. My example was purely hypothetical, but I really am contemplating having hundreds of fields. Regards, Chris > -----Original Message----- > From: ferret-talk-bounces at rubyforge.org > [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of David Balmain > Sent: Monday, March 05, 2007 7:22 PM > To: ferret-talk at rubyforge.org > Subject: Re: [Ferret-talk] Cost of using many fields > > On 3/6/07, Waters, Chris wrote: > > In ferret, and especially when using acts_as_ferret, it is easy to > > specify many fields. What is the cost of using a lot of > fields from a > > performance perspective? Is each field searched separately, or are > > they combined together in the inverted index. > > Hi Chris, > > Each field is searched separately so the more fields you > search the longer the search will take. Also note that there > shouldn't be any difference in the time to search a single > field whether you have 1 field or 1 million. It will only > take longer if you search all 1 million fields. > > > As an extreme example, if I made every word in my documents > a separate > > field (so the first word in each document was field 1 and > the second > > word was field 2, etc) would this be significantly less > efficient than > > treating the entire document as a single field? > > > > > > > > I am not doing something quite as bad as this hypothetical example, > > but I am investigating different ways to organize some data. > > I'm not sure exactly what you want to do but you may want to > look at span queries. These queries allow you to search based > on the positions of the terms in the document. But perhaps > your hypothetical is misleading me. > > Cheers, > Dave > > -- > Dave Balmain > http://www.davebalmain.com/ > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From sanjay at jarna.com Tue Mar 6 00:17:13 2007 From: sanjay at jarna.com (Sanjay Kapoor) Date: Tue, 6 Mar 2007 06:17:13 +0100 Subject: [Ferret-talk] [AAF] remote indexing via DRb with acts_as_ferret In-Reply-To: <20070304214947.GB28769@cordoba.webit.de> References: <20070204193455.GB29012@cordoba.webit.de> <9dfbb7b7672096f845c20c1b11370f57@ruby-forum.com> <20070304214947.GB28769@cordoba.webit.de> Message-ID: <46c98b69e8256bd86b075184ed9e286c@ruby-forum.com> I'm using 1.1.6 using acts_as_ferret and DRb. It seems to work for basic queries, but I've run into a problem when using sorting and using the :limit and :offset options for pagination. I find that the query results are no longer sorted by the sort field, and I seem to get the same results irrespective of the :limit and :offset parameters. If I don't sort the results, the :limit and :offset parameters work as expected when using DRb. When I removed DRb from the setup, the sorting and pagination options work as expected. Has anyone else come across this problem? Sanjay Jens Kraemer wrote: > On Sun, Mar 04, 2007 at 08:59:26AM +0100, donut donut wrote: >> Jens Kraemer wrote: > [..] >> > >> > I wrote some documentation to get you started with the remote indexing >> > stuff at >> > http://projects.jkraemer.net/acts_as_ferret/wiki/DrbServer >> > >> >> Hi, thanks for the great work! Does this work with Rails 1.1.6 as I >> haven't made the switch to 1.2.1 yet? > > I didn't test with 1.1.6 but it should work. > > Jens > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen Malessa -- Posted via http://www.ruby-forum.com/. From joestelmach at gmail.com Tue Mar 6 00:32:07 2007 From: joestelmach at gmail.com (Joe Stelmach) Date: Tue, 6 Mar 2007 06:32:07 +0100 Subject: [Ferret-talk] Lucene index compatibility In-Reply-To: References: <05b3acd9fed1492c88fc13d24c41f1dd@ruby-forum.com> Message-ID: Thanks for the fast reply Dave. I'm working on a system that does some back-end processing in Java, but uses Rails on the front end. I'd really like to ditch Java completely, but the Threading support in Ruby is very limiting (at least as far as I understand it,) so I'm stuck trying to glue the pieces together. Thanks for the solr tip - I'll see if that can help me... -- Posted via http://www.ruby-forum.com/. From sanjay at jarna.com Tue Mar 6 00:39:13 2007 From: sanjay at jarna.com (Sanjay Kapoor) Date: Tue, 6 Mar 2007 06:39:13 +0100 Subject: [Ferret-talk] [AAF] remote indexing via DRb with acts_as_ferret In-Reply-To: <46c98b69e8256bd86b075184ed9e286c@ruby-forum.com> References: <20070204193455.GB29012@cordoba.webit.de> <9dfbb7b7672096f845c20c1b11370f57@ruby-forum.com> <20070304214947.GB28769@cordoba.webit.de> <46c98b69e8256bd86b075184ed9e286c@ruby-forum.com> Message-ID: <91221b6790a53d531ca8907ab97c0069@ruby-forum.com> I did some further testing using the DRb setup. This time, I kept the sort but removed the :limit and :offset options. The results were not properly sorted and only 10 items were returned, even though there are 48 matching results and no limit imposed on the query. Then I removed the DRb from the setup and the 48 results came back properly sorted in a single query. Sanjay Sanjay Kapoor wrote: > I'm using 1.1.6 using acts_as_ferret and DRb. It seems to work for > basic queries, but I've run into a problem when using sorting and using > the :limit and :offset options for pagination. I find that the query > results are no longer sorted by the sort field, and I seem to get the > same results irrespective of the :limit and :offset parameters. If I > don't sort the results, the :limit and :offset parameters work as > expected when using DRb. > > When I removed DRb from the setup, the sorting and pagination options > work as expected. Has anyone else come across this problem? > > Sanjay > > Jens Kraemer wrote: >> On Sun, Mar 04, 2007 at 08:59:26AM +0100, donut donut wrote: >>> Jens Kraemer wrote: >> [..] >>> > >>> > I wrote some documentation to get you started with the remote indexing >>> > stuff at >>> > http://projects.jkraemer.net/acts_as_ferret/wiki/DrbServer >>> > >>> >>> Hi, thanks for the great work! Does this work with Rails 1.1.6 as I >>> haven't made the switch to 1.2.1 yet? >> >> I didn't test with 1.1.6 but it should work. >> >> Jens >> >> -- >> Jens Kr?mer >> webit! Gesellschaft f?r neue Medien mbH >> Schnorrstra?e 76 | 01069 Dresden >> Telefon +49 351 46766-0 | Telefax +49 351 46766-66 >> kraemer at webit.de | www.webit.de >> >> Amtsgericht Dresden | HRB 15422 >> GF Sven Haubold, Hagen Malessa -- Posted via http://www.ruby-forum.com/. From adam.thorsen at gmail.com Tue Mar 6 00:48:08 2007 From: adam.thorsen at gmail.com (Adam Thorsen) Date: Tue, 6 Mar 2007 06:48:08 +0100 Subject: [Ferret-talk] case-sensitivity of analyzer Message-ID: Is there anything about this analyzer that says "case-sensitive" to you? module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end Just wondering how I can force my index to be case-insensitive. Thanks, -Adam -- Posted via http://www.ruby-forum.com/. From jgelens at gmail.com Tue Mar 6 00:51:27 2007 From: jgelens at gmail.com (Jeffrey Gelens) Date: Tue, 6 Mar 2007 06:51:27 +0100 Subject: [Ferret-talk] Problem with large index file In-Reply-To: References: <76885f200ff968a2fc0a3eef8c8f8a68@ruby-forum.com> Message-ID: <9cba8991f3e22aa84b9026d6d89b1de4@ruby-forum.com> After optimization the exact same problem occurs. Greetings, Jeffrey Gelens David Balmain wrote: > Hi Jeffrey, > > That's great to hear. If you have a chance, could you try copying the > index (cp -r) and then opening the copy and optimizing it. Then let me > know if you are still getting the same problem you were getting > before. I understand if this is too much trouble. 5Gb is a lot of data > to be playing around with. > > Cheers, > Dave -- Posted via http://www.ruby-forum.com/. From admin at mightytofu.com Tue Mar 6 03:24:05 2007 From: admin at mightytofu.com (Ted) Date: Tue, 6 Mar 2007 09:24:05 +0100 Subject: [Ferret-talk] Getting non-stemmed terms from IndexReader In-Reply-To: References: Message-ID: <63fc8c31a24ea84db126248012036ef4@ruby-forum.com> Got it. I had thought that 'flush' would do the trick, but i guess not so. I think I will have to call optimize but do so only when necessary then. Thanks for your response. David Balmain wrote: > On 3/6/07, Ted wrote: >> I encountered another problem: >> >> After I removed docs from the index, the doc_freq returned by >> IndexReader.terms is not updated. It always shows the old number or >> bigger number after more docs with that term is added. >> So it looks like the doc_freq is not updated corrected on removal of a >> doc. > > This is impossible to fix without ruining performance. To fix this > problem I would basically need to optimize the index after every > deletion. In fact, you can do this yourself if you like. Just optimize > the index whenever you need to rely on the doc frequency being correct > and you have possible deletions in the index. > > Cheers, > Dave -- Posted via http://www.ruby-forum.com/. From henke at mac.se Tue Mar 6 04:23:55 2007 From: henke at mac.se (Henrik Zagerholm) Date: Tue, 6 Mar 2007 10:23:55 +0100 Subject: [Ferret-talk] Highlight method creates malformed characters. Message-ID: <0910D526-1CF7-43A3-BFEF-8C3AB155BCA3@mac.se> Hello list, I've just updated to 0.11.3 and I came upon this problem. When using the highlight function in a rails application I sometimes get the error "Malformed UTF-8 character." It is just on certain documents and if I change the excerpt length the error disappears so it feels that if the highlight method cuts the text at the wrong character the function returns a string with malformed characters. This then causes the error when the text is outputted. Is this a problem with ferret or with rails? I'm still on 1.1.6 and I haven't upgraded to rails 1.2 yet so I'm not using the new multi byte unicode functions. Regards, henrik From kraemer at webit.de Tue Mar 6 04:34:39 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 6 Mar 2007 10:34:39 +0100 Subject: [Ferret-talk] case-sensitivity of analyzer In-Reply-To: References: Message-ID: <20070306093439.GA28036@cordoba.webit.de> On Tue, Mar 06, 2007 at 06:48:08AM +0100, Adam Thorsen wrote: > Is there anything about this analyzer that says "case-sensitive" to you? yep :-) There's no LowerCaseFilter involved. StemFilter.new(LowerCaseFilter.new(StandardTokenizer.new(text))) should do the trick. Jens > > > module Ferret::Analysis > class StemmingAnalyzer > def token_stream(field, text) > StemFilter.new(StandardTokenizer.new(text)) > end > end > end > > > Just wondering how I can force my index to be case-insensitive. > > Thanks, > -Adam > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Tue Mar 6 04:44:56 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 6 Mar 2007 10:44:56 +0100 Subject: [Ferret-talk] [AAF] remote indexing via DRb with acts_as_ferret In-Reply-To: <91221b6790a53d531ca8907ab97c0069@ruby-forum.com> References: <20070204193455.GB29012@cordoba.webit.de> <9dfbb7b7672096f845c20c1b11370f57@ruby-forum.com> <20070304214947.GB28769@cordoba.webit.de> <46c98b69e8256bd86b075184ed9e286c@ruby-forum.com> <91221b6790a53d531ca8907ab97c0069@ruby-forum.com> Message-ID: <20070306094456.GB28036@cordoba.webit.de> Hi! On Tue, Mar 06, 2007 at 06:39:13AM +0100, Sanjay Kapoor wrote: > I did some further testing using the DRb setup. This time, I kept the > sort but removed the :limit and :offset options. The results were not > properly sorted and only 10 items were returned, even though there are > 48 matching results and no limit imposed on the query. Then I removed > the DRb from the setup and the 48 results came back properly sorted in a > single query. Looks like you just found a bug in the DRb code - I'll try to fix this asap. Could you please post your call to find_by_contents, including the construction of your SortFields? The acts_as_ferret statement in your model might help, too. About the number of results returned - are you sure you got all 48 results back from a call to find_by_contents without :limit parameter? By default only 10 hits will be returned and you'll need to pass :limit => :all for aaf to give you all results. However results.total_hits will give you the total number of results. Maybe only that value is different with or without DRb? Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From erik at ehatchersolutions.com Tue Mar 6 04:48:27 2007 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Tue, 6 Mar 2007 04:48:27 -0500 Subject: [Ferret-talk] Lucene index compatibility In-Reply-To: References: <05b3acd9fed1492c88fc13d24c41f1dd@ruby-forum.com> Message-ID: <9BD60075-E358-475C-A79F-1D5F98ED4C9F@ehatchersolutions.com> solr-ruby is a library to connect, of course, Solr to Ruby. So you won't be ditching Java to go to Solr, but rather running the Solr Java-based web application via a Java web container (Jetty, Tomcat, Resin, or other). Solr is incredibly scalable and fast. And handles any number of connections at a time for both reading and writing. Erik On Mar 6, 2007, at 12:32 AM, Joe Stelmach wrote: > Thanks for the fast reply Dave. > > I'm working on a system that does some back-end processing in Java, > but > uses Rails on the front end. I'd really like to ditch Java > completely, > but the Threading support in Ruby is very limiting (at least as far > as I > understand it,) so I'm stuck trying to glue the pieces together. > > Thanks for the solr tip - I'll see if that can help me... > > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From kraemer at webit.de Tue Mar 6 04:53:33 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 6 Mar 2007 10:53:33 +0100 Subject: [Ferret-talk] programatically stopping acts_as_ferret drb server In-Reply-To: References: <20070305165327.GE25625@cordoba.webit.de> Message-ID: <20070306095333.GC28036@cordoba.webit.de> On Tue, Mar 06, 2007 at 02:05:20AM +0100, Adam Thorsen wrote: > Ok I've put together a couple of scripts that are basically a hodgepodge > of daemonize, mongrel, and aaf code that start and stop ferret_server. > > The start script looks for the pid_file value in ferret_server.yml, then > creates a pid file in config based on that value, forks, end exits. > > The stop script looks for the pid_file in the same place and sends the > TERM signal. Upon receiving the TERM signal, script that launched > ferret_server removes the pid file and exits. cool, I'll add this to trunk asap! cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From neongrau at gmail.com Tue Mar 6 05:46:26 2007 From: neongrau at gmail.com (neongrau __) Date: Tue, 6 Mar 2007 11:46:26 +0100 Subject: [Ferret-talk] aaf excluding certain db records from indexing Message-ID: <48c378c9209d19f9e29c98b7e98e7c38@ruby-forum.com> hi! short question about aaf: is there a builtin functionality in acts_as_ferret to exclude records from being indexed when for example a column "is_deleted" / "is_disabled" / "dont_index" has a certain value? regards neongrau -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Tue Mar 6 06:00:46 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 6 Mar 2007 22:00:46 +1100 Subject: [Ferret-talk] Problem with large index file In-Reply-To: <9cba8991f3e22aa84b9026d6d89b1de4@ruby-forum.com> References: <76885f200ff968a2fc0a3eef8c8f8a68@ruby-forum.com> <9cba8991f3e22aa84b9026d6d89b1de4@ruby-forum.com> Message-ID: On 3/6/07, Jeffrey Gelens wrote: > After optimization the exact same problem occurs. Thanks Jeffrey, I'll keep looking into this. I'm glad your index works for the moment though. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From admin at mightytofu.com Tue Mar 6 06:02:17 2007 From: admin at mightytofu.com (Ted) Date: Tue, 6 Mar 2007 12:02:17 +0100 Subject: [Ferret-talk] in acts_as_ferret, excluding records from rebuild_index Message-ID: Is there any way for the automatic 'rebuild_index' of 'acts_as_ferret' from exclude certain records from being included in the index? -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Tue Mar 6 06:04:22 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 6 Mar 2007 22:04:22 +1100 Subject: [Ferret-talk] Highlight method creates malformed characters. In-Reply-To: <0910D526-1CF7-43A3-BFEF-8C3AB155BCA3@mac.se> References: <0910D526-1CF7-43A3-BFEF-8C3AB155BCA3@mac.se> Message-ID: On 3/6/07, Henrik Zagerholm wrote: > Hello list, > > I've just updated to 0.11.3 and I came upon this problem. > When using the highlight function in a rails application I sometimes > get the error "Malformed UTF-8 character." > > It is just on certain documents and if I change the excerpt length > the error disappears so it feels that if the highlight method cuts > the text at the wrong character the function returns a string with > malformed characters. > This then causes the error when the text is outputted. > > Is this a problem with ferret or with rails? > I'm still on 1.1.6 and I haven't upgraded to rails 1.2 yet so I'm not > using the new multi byte unicode functions. Hi Henrik, This must be a problem with Ferret. I'm not sure how it is happening though. If you can narrow it down to a failing test case it would be really helpful. Thanks, Dave -- Dave Balmain http://www.davebalmain.com/ From john at johnleach.co.uk Tue Mar 6 07:08:30 2007 From: john at johnleach.co.uk (John Leach) Date: Tue, 06 Mar 2007 12:08:30 +0000 Subject: [Ferret-talk] ferret 0.11.3 - File Not Found Message-ID: <1173182910.12321.7.camel@localhost.localdomain> Hi, I noticed a 0.11.3 release and gave it a whirl. I've so far not been able to reproduce any segfaults with my ferret_killer[1] script, though I do get some errors. when the searching process is running and I start the indexing process, I immediately get: /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:692:in `latest?': File Not Found Error occured at :93 in xraise (FileNotFoundError) Error occured in index.c:825 - sis_find_segments_file couldn't find segments file from /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:692:in `ensure_reader_open' from /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:713:in `ensure_searcher_open' from /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:735:in `do_search' from /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:334:in `search' from /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' from /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:333:in `search' from ferret_killer1.rb:39:in `ferret_search' from ferret_killer1.rb:59 when the indexing process is running and I start the searching process, I immediately get one of the following: /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:285:in `<<': Interrupt from /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:8:in `synchrolock' from /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' from /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:8:in `synchrolock' from /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:267:in `<<' from ferret_killer1.rb:50 /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:285:in `delete': File Not Found Error occured at :117 in xpop_context (FileNotFoundError) Error occured in fs_store.c:329 - fs_open_input tried to open "my_ferret_index//_1og.cfs" but it doesn't exist: from /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:285:in `<<' from /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:8:in `synchrolock' from /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' from /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:8:in `synchrolock' from /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:267:in `<<' from ferret_killer1.rb:50 Hope this is useful, John. [1] http://johnleach.co.uk/downloads/ruby/ferret/ferret_killer1.rb From henke at mac.se Tue Mar 6 07:34:34 2007 From: henke at mac.se (Henrik Zagerholm) Date: Tue, 6 Mar 2007 13:34:34 +0100 Subject: [Ferret-talk] Highlight method creates malformed characters. In-Reply-To: References: <0910D526-1CF7-43A3-BFEF-8C3AB155BCA3@mac.se> Message-ID: <74D1E80A-9BA5-4C5E-A006-4AE0204C8D6B@mac.se> 6 mar 2007 kl. 12:04 skrev David Balmain: > On 3/6/07, Henrik Zagerholm wrote: >> Hello list, >> >> I've just updated to 0.11.3 and I came upon this problem. >> When using the highlight function in a rails application I sometimes >> get the error "Malformed UTF-8 character." >> >> It is just on certain documents and if I change the excerpt length >> the error disappears so it feels that if the highlight method cuts >> the text at the wrong character the function returns a string with >> malformed characters. >> This then causes the error when the text is outputted. >> >> Is this a problem with ferret or with rails? >> I'm still on 1.1.6 and I haven't upgraded to rails 1.2 yet so I'm not >> using the new multi byte unicode functions. > > Hi Henrik, > This must be a problem with Ferret. I'm not sure how it is happening > though. If you can narrow it down to a failing test case it would be > really helpful. Hi Dave, I'll do some more testing and I'll try to present a small test case for you. You'll have it in a day or two. Cheers, Henrik > > Thanks, > Dave > > > -- > Dave Balmain > http://www.davebalmain.com/ > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From kraemer at webit.de Tue Mar 6 07:49:19 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 6 Mar 2007 13:49:19 +0100 Subject: [Ferret-talk] aaf excluding certain db records from indexing In-Reply-To: <48c378c9209d19f9e29c98b7e98e7c38@ruby-forum.com> References: <48c378c9209d19f9e29c98b7e98e7c38@ruby-forum.com> Message-ID: <20070306124919.GF28036@cordoba.webit.de> On Tue, Mar 06, 2007 at 11:46:26AM +0100, neongrau __ wrote: > hi! > > short question about aaf: > > is there a builtin functionality in acts_as_ferret to exclude records > from being indexed when for example a column "is_deleted" / > "is_disabled" / "dont_index" has a certain value? just override the ferret_enabled? instance method. cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Tue Mar 6 07:58:24 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 6 Mar 2007 13:58:24 +0100 Subject: [Ferret-talk] in acts_as_ferret, excluding records from rebuild_index In-Reply-To: References: Message-ID: <20070306125824.GG28036@cordoba.webit.de> On Tue, Mar 06, 2007 at 12:02:17PM +0100, Ted wrote: > Is there any way for the automatic 'rebuild_index' of 'acts_as_ferret' > from exclude certain records from being included in the index? rebuild_index uses Model.find(:all), so if you override that to not return all records (as acts_as_paranoid does to not include records marked as deleted) this will be possible. Another way would be to patch rebuild_index so it only indexes records where ferret_enabled? returns true. You could then override this method to check whatever condition to determine if the record should be indexed. For 0.3.1/stable simply change line 205 in class_methods.rb from index << rec.to_doc to index << rec.to_doc if rec.ferret_enabled? I think I'll add this check in the next release. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From neongrau at gmail.com Tue Mar 6 08:18:34 2007 From: neongrau at gmail.com (neongrau __) Date: Tue, 6 Mar 2007 14:18:34 +0100 Subject: [Ferret-talk] aaf excluding certain db records from indexing In-Reply-To: <20070306124919.GF28036@cordoba.webit.de> References: <48c378c9209d19f9e29c98b7e98e7c38@ruby-forum.com> <20070306124919.GF28036@cordoba.webit.de> Message-ID: Jens Kraemer wrote: s_disabled" / "dont_index" has a certain value? > > just override the ferret_enabled? instance method. > hallo jens! so i just put s.th. like this into my model? def ferret_enabled? self.is_deleted ? false : true end and aaf will skip the record when indexing? -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue Mar 6 08:21:08 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 6 Mar 2007 14:21:08 +0100 Subject: [Ferret-talk] aaf excluding certain db records from indexing In-Reply-To: References: <48c378c9209d19f9e29c98b7e98e7c38@ruby-forum.com> <20070306124919.GF28036@cordoba.webit.de> Message-ID: <20070306132108.GH28036@cordoba.webit.de> On Tue, Mar 06, 2007 at 02:18:34PM +0100, neongrau __ wrote: > Jens Kraemer wrote: > s_disabled" / "dont_index" has a certain value? > > > > just override the ferret_enabled? instance method. > > > > hallo jens! > > so i just put s.th. like this into my model? > > def ferret_enabled? > self.is_deleted ? false : true > end > > and aaf will skip the record when indexing? that's right. However this does not apply to index rebuilds, please see my other recent mail to the list for a solution to this problem. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From chad at zulu.net Tue Mar 6 09:31:32 2007 From: chad at zulu.net (Chad Thatcher) Date: Tue, 6 Mar 2007 15:31:32 +0100 Subject: [Ferret-talk] Need clarification of documentation In-Reply-To: References: <44337b07d5a83c4c3cec1e3a035eb3cf@ruby-forum.com> Message-ID: <9d997333c56732f77c7a54ff26446fa9@ruby-forum.com> Thanks Dave, that has cleared everything up for me. Excellent engine by the way, thanks for all your hard work on this. -- Posted via http://www.ruby-forum.com/. From no at spam.com Tue Mar 6 14:18:54 2007 From: no at spam.com (mix) Date: Tue, 6 Mar 2007 20:18:54 +0100 Subject: [Ferret-talk] bug or "feature"? Message-ID: hi, i'm trying ferret, i've a model which has some records and two of them have a title with the word 'again' (one or more time), so i've tried to do a search for 'again', but i didn't found anything...i've edited the title with 'test again', searched for 'test', and i've found them....another time with 'again', but nothing, so i've tried with 'my test'....searched for 'my', and....nothing...is it a problem or like a "feature"? another little problem i've found is that i've written this for the search: acts_as_ferret :fields => {:title => {}, :category_id => {}, :bought_at_int => {:index => :untokenized_omit_norms, :term_vector => :no}, :gift => {:index => :untokenized_omit_norms, :term_vector => :no}} def self.full_text_search(query, category_id) return nil if query.nil? or (query == '') query += " +category_id:{category_id} +bought_at_int:>#{Time.now.to_i} +gift:false" sort = Ferret::Search::SortField.new(:bought_at_int, :type => :byte, :reverse => false) self.find_by_contents(query, {:limit => :all, :sort => sort}, { :include => [:user] }) end this works....initially i've tried with query = "+title:#{query} +bought_at_int:>#{Time.now.to_i} +gift:false" but this doesn't wok, can someone tell me what? i think is the same....(the second is better because the query is only for the title and not for other fields) -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Tue Mar 6 14:48:08 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Tue, 06 Mar 2007 20:48:08 +0100 Subject: [Ferret-talk] bug or "feature"? In-Reply-To: References: Message-ID: <45EDC578.40708@benjaminkrause.com> mix schrieb: > hi, i'm trying ferret, i've a model which has some records and two of > them have a title with the word 'again' (one or more time), so i've > tried to do a search for 'again', but i didn't found anything...i've > edited the title with 'test again', searched for 'test', and i've found > them....another time with 'again', but nothing, so i've tried with 'my > test'....searched for 'my', and....nothing...is it a problem or like a > "feature"? try this: bash$ script/console >> Ferret::Analysis::FULL_ENGLISH_STOP_WORDS and see: http://ferret.davebalmain.com/api/classes/Ferret/Analysis.html Ben :) From no at spam.com Tue Mar 6 14:55:18 2007 From: no at spam.com (mix) Date: Tue, 6 Mar 2007 20:55:18 +0100 Subject: [Ferret-talk] bug or "feature"? In-Reply-To: <45EDC578.40708@benjaminkrause.com> References: <45EDC578.40708@benjaminkrause.com> Message-ID: <3448057cc7285a7842df9e190db69797@ruby-forum.com> Benjamin Krause wrote: > mix schrieb: >> hi, i'm trying ferret, i've a model which has some records and two of >> them have a title with the word 'again' (one or more time), so i've >> tried to do a search for 'again', but i didn't found anything...i've >> edited the title with 'test again', searched for 'test', and i've found >> them....another time with 'again', but nothing, so i've tried with 'my >> test'....searched for 'my', and....nothing...is it a problem or like a >> "feature"? > > try this: > > bash$ script/console >>> Ferret::Analysis::FULL_ENGLISH_STOP_WORDS > > and see: http://ferret.davebalmain.com/api/classes/Ferret/Analysis.html > > Ben :) ok, now is more clear :) but i've just another problem.... if i search something like +, "", References: <1173182910.12321.7.camel@localhost.localdomain> Message-ID: On 3/6/07, John Leach wrote: > Hi, > > I noticed a 0.11.3 release and gave it a whirl. I've so far not been > able to reproduce any segfaults with my ferret_killer[1] script, though > I do get some errors. > Errors Hi John, I should have told you about this already. It is actually a problem in the ferret_killer script. You have :create set to true so both processes will delete the index and create a new one. Obviously this is not a good thing if one processes is writing to the index and another comes along and deletes all the files. To prevent this from happening, remove line 31; :create => true) After that, everything should work as expected. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Tue Mar 6 17:55:49 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 7 Mar 2007 09:55:49 +1100 Subject: [Ferret-talk] Highlight method creates malformed characters. In-Reply-To: <74D1E80A-9BA5-4C5E-A006-4AE0204C8D6B@mac.se> References: <0910D526-1CF7-43A3-BFEF-8C3AB155BCA3@mac.se> <74D1E80A-9BA5-4C5E-A006-4AE0204C8D6B@mac.se> Message-ID: On 3/6/07, Henrik Zagerholm wrote: > Hi Dave, > > I'll do some more testing and I'll try to present a small test case > for you. > > You'll have it in a day or two. Hi Henrik, If it is any easier, all I probably need is the failing search string. It'd be even better if you could give me the matching text that is supposed to be highlighted but I can probably work it out from there. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From Neville.Burnell at bmsoft.com.au Tue Mar 6 20:12:34 2007 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Wed, 7 Mar 2007 12:12:34 +1100 Subject: [Ferret-talk] A note about #search vs #search_each Message-ID: <126EC586577FD611A28E00A0C9A03758B5C5E1@maui.bmsoft.com.au> Hi, I just "solved" an issue which I mentioned on this list many moons ago, regarding searches somehow being serialized, such that a long search would cause others to wait noticeably. Anyhow, after coding both :limit and applying homegrown thread time limit monitoring, I discovered that Searcher#search_each uses rb_thread_critical = Qtrue whereas Searcher#search doesn't. I changed my code to use TopDocs and Searcher#search and viola, concurrent searches! Cheers, Nevill From john at johnleach.co.uk Tue Mar 6 21:06:08 2007 From: john at johnleach.co.uk (John Leach) Date: Wed, 07 Mar 2007 02:06:08 +0000 Subject: [Ferret-talk] ferret 0.11.3 - File Not Found In-Reply-To: References: <1173182910.12321.7.camel@localhost.localdomain> Message-ID: <1173233168.27167.4.camel@localhost.localdomain> On Wed, 2007-03-07 at 09:52 +1100, David Balmain wrote: > Hi John, > > I should have told you about this already. It is actually a problem in > the ferret_killer script. You have :create set to true so both > processes will delete the index and create a new one. lol, silly me. Ta Dave. Though I've just discovered that the rails app that I extracted this from is still segfaulting the searching process :/ /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:738: [BUG] Segmentation fault I'll try and write another test to reproduce. Thanks, John. From sanjay at jarna.com Wed Mar 7 00:31:37 2007 From: sanjay at jarna.com (Sanjay Kapoor) Date: Wed, 7 Mar 2007 06:31:37 +0100 Subject: [Ferret-talk] [AAF] remote indexing via DRb with acts_as_ferret In-Reply-To: <20070306094456.GB28036@cordoba.webit.de> References: <20070204193455.GB29012@cordoba.webit.de> <9dfbb7b7672096f845c20c1b11370f57@ruby-forum.com> <20070304214947.GB28769@cordoba.webit.de> <46c98b69e8256bd86b075184ed9e286c@ruby-forum.com> <91221b6790a53d531ca8907ab97c0069@ruby-forum.com> <20070306094456.GB28036@cordoba.webit.de> Message-ID: <596199d7e57a4ff0249a31e8911e4fbc@ruby-forum.com> My model's full_text_search method looks like: def self.full_text_search(q, options = {}) return nil if q.nil? or q.empty? default_options = {:limit => 10, :page => 1} default_options.update options # get the offset based on the page and limit default_options[:page] = 1 if default_options[:page] == 0 default_options[:offset] = default_options[:limit] * (default_options.delete(:page).to_i-1) # now do the query with options results = Article.find_by_contents(q, default_options) return [results.total_hits, results] end My search controller uses it like this: # build sort @sf_published_at = Ferret::Search::SortField.new(:published_at_string, :type => :string, :reverse => true) @sort = Ferret::Search::Sort.new(@sf_published_at) # set options @options = {:limit => 20, :sort => @sort} @total, @articles = Article.full_text_search(@query_string, @options) And yes, you're absolutely right about sending in :limit => :all. I forgot to mention that in my earlier post. I'm currently getting around this issue by returning all the results and sorting in ruby. Thanks for looking into this. Sanjay Jens Kraemer wrote: > Hi! > > On Tue, Mar 06, 2007 at 06:39:13AM +0100, Sanjay Kapoor wrote: >> I did some further testing using the DRb setup. This time, I kept the >> sort but removed the :limit and :offset options. The results were not >> properly sorted and only 10 items were returned, even though there are >> 48 matching results and no limit imposed on the query. Then I removed >> the DRb from the setup and the 48 results came back properly sorted in a >> single query. > > Looks like you just found a bug in the DRb code - I'll try to fix this > asap. Could you please post your call to find_by_contents, including the > construction of your SortFields? > The acts_as_ferret statement in your model might help, too. > > About the number of results returned - are you sure you got all 48 > results back from a call to find_by_contents without :limit parameter? > By default only 10 hits will be returned and you'll need to pass > :limit => :all for aaf to give you all results. However > results.total_hits will give you the total number of results. Maybe only > that value is different with or without DRb? > > Jens > > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen Malessa -- Posted via http://www.ruby-forum.com/. From no at spam.com Wed Mar 7 17:42:40 2007 From: no at spam.com (mix) Date: Wed, 7 Mar 2007 23:42:40 +0100 Subject: [Ferret-talk] bug or "feature"? In-Reply-To: <3448057cc7285a7842df9e190db69797@ruby-forum.com> References: <45EDC578.40708@benjaminkrause.com> <3448057cc7285a7842df9e190db69797@ruby-forum.com> Message-ID: <921e0f4ff2927766886b97f6a85876df@ruby-forum.com> mix wrote: > but i've just another problem.... if i search something like +, "", and so on it found me all records O_o do you know why? > is it correct the query ? about the title:#{query} any words? :( > thanks :) anyone? :( -- Posted via http://www.ruby-forum.com/. From samuelgiffney at gmail.com Wed Mar 7 18:04:02 2007 From: samuelgiffney at gmail.com (Sam Giffney) Date: Thu, 8 Mar 2007 00:04:02 +0100 Subject: [Ferret-talk] bug or "feature"? In-Reply-To: <921e0f4ff2927766886b97f6a85876df@ruby-forum.com> References: <45EDC578.40708@benjaminkrause.com> <3448057cc7285a7842df9e190db69797@ruby-forum.com> <921e0f4ff2927766886b97f6a85876df@ruby-forum.com> Message-ID: <7e72ec656c3b196aa2f5ee2aa32bdfd2@ruby-forum.com> mix wrote: > mix wrote: >> but i've just another problem.... if i search something like +, "", > and so on it found me all records O_o do you know why? >> is it correct the query ? about the title:#{query} any words? :( >> thanks :) Maybe just me, but I can't understand your question. Can you try rephrasing or perhaps including some code and results? -- Posted via http://www.ruby-forum.com/. From no at spam.com Wed Mar 7 18:15:55 2007 From: no at spam.com (mix) Date: Thu, 8 Mar 2007 00:15:55 +0100 Subject: [Ferret-talk] bug or "feature"? In-Reply-To: <7e72ec656c3b196aa2f5ee2aa32bdfd2@ruby-forum.com> References: <45EDC578.40708@benjaminkrause.com> <3448057cc7285a7842df9e190db69797@ruby-forum.com> <921e0f4ff2927766886b97f6a85876df@ruby-forum.com> <7e72ec656c3b196aa2f5ee2aa32bdfd2@ruby-forum.com> Message-ID: <2ea6fc3717b00e2e4934f54d9fe4469f@ruby-forum.com> Sam Giffney wrote: > mix wrote: >> mix wrote: >>> but i've just another problem.... if i search something like +, "", >> and so on it found me all records O_o do you know why? >>> is it correct the query ? about the title:#{query} any words? :( >>> thanks :) > > Maybe just me, but I can't understand your question. Can you try > rephrasing or perhaps including some code and results? sure...so, i've a model which has this code: ---------------- acts_as_ferret :fields => {:title => {}, :all_categories => {}, :user_id => {}, :bought_at_int => {:index => :untokenized_omit_norms, :term_vector => :no}, :gift => {:index => :untokenized_omit_norms, :term_vector => :no}} def self.full_text_search(query, category_id, extra) return nil if query.nil? or (query == '') query += " +all_categories:#{category_id} +bought_at_int:<#{15.days.from_now.to_i} +gift:false #{extra}" sort = Ferret::Search::SortField.new(:expires_at_int, :type => :byte, :reverse => false) self.find_by_contents(query, {:limit => :all, :sort => sort}, { :include => [:user, :categories] }) end ---------------- the search controller is: def show @category = Category.find_by_id(params[:category_id] || 1) @obj = Model.full_text_search(params[:query], @category.id, '') end ----------------- (i use the extra variable in another part to search for a user, like: ' +user:1') anyway, if i search for a title (which i have in the db) 'test', ferret found me without problem, or another title like 'fishing guide', and the result is pretty good (ok ok, is perfect :)), but when i search for something like: '+', '""', ' References: <45EDC578.40708@benjaminkrause.com> <3448057cc7285a7842df9e190db69797@ruby-forum.com> <921e0f4ff2927766886b97f6a85876df@ruby-forum.com> <7e72ec656c3b196aa2f5ee2aa32bdfd2@ruby-forum.com> <2ea6fc3717b00e2e4934f54d9fe4469f@ruby-forum.com> Message-ID: <20070308090249.GP28036@cordoba.webit.de> On Thu, Mar 08, 2007 at 12:15:55AM +0100, mix wrote: [..] > > (i use the extra variable in another part to search for a user, like: ' > +user:1') > anyway, if i search for a title (which i have in the db) 'test', ferret > found me without problem, or another title like 'fishing guide', and the > result is pretty good (ok ok, is perfect :)), but when i search for > something like: '+', '""', ' db.... why? :( > and so on it found me all records could you please have a look into your application's log file, and look for the query aaf actually runs against the ferret index? the relevant line should start with 'query: ' Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From no at spam.com Thu Mar 8 08:22:45 2007 From: no at spam.com (mix) Date: Thu, 8 Mar 2007 14:22:45 +0100 Subject: [Ferret-talk] bug or "feature"? In-Reply-To: <20070308090249.GP28036@cordoba.webit.de> References: <45EDC578.40708@benjaminkrause.com> <3448057cc7285a7842df9e190db69797@ruby-forum.com> <921e0f4ff2927766886b97f6a85876df@ruby-forum.com> <7e72ec656c3b196aa2f5ee2aa32bdfd2@ruby-forum.com> <2ea6fc3717b00e2e4934f54d9fe4469f@ruby-forum.com> <20070308090249.GP28036@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > could you please have a look into your application's log file, and look > for the query aaf actually runs against the ferret index? the relevant > line should start with 'query: ' > > Jens > "Query: +all_categories:1 +bought_at_int:<1174655577 +gift:false" with this and a search '+' it found all books (it changes '+' with ' ') same for '?' now i've changed the query with: query = "+title:#{query} +all_categories:#{category_id} +bought_at_int:<#{15.days.from_now.to_i} +gift:false #{extra}" this works, now with '?' it doesn't found anything, but... if i search a stop word (my, all, your, etc) it found all books :( the log: Query: +title:all +all_categories:1 +bought_at_int:<1174655577 +gift:false -- Posted via http://www.ruby-forum.com/. From lajanus at o2.pl Thu Mar 8 11:14:29 2007 From: lajanus at o2.pl (Linus) Date: Thu, 8 Mar 2007 17:14:29 +0100 Subject: [Ferret-talk] Test fail on debian 3.1 Message-ID: <39736ca639df5624bd5ca6e8b2622a92@ruby-forum.com> I have a problem, I use utf all over a rails site, but the search failes to search characters with acccents... I try to debug it, and i had run unit tests for ferret, can those failures cose problems?
/usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/test/ ruby test_all.rb
Loading once
Loaded suite test_all
Started
................F.............................................FF...........................................................................................FF...FF
Finished in 4.729505 seconds.

  1) Failure:
test_custom_filter(CustomAnalyzerTest)
[./unit/../unit/index/../../unit/store/../../unit/analysis/tc_analyzer.rb:516]:
 expected but was
.

  2) Failure:
test_letter_analyzer(LetterAnalyzerTest)
[./unit/../unit/index/../../unit/store/../../unit/analysis/tc_analyzer.rb:100]:
 expected but was
.

  3) Failure:
test_letter_tokenizer(LetterTokenizerTest)
[./unit/../unit/index/../../unit/store/../../unit/analysis/tc_token_stream.rb:73]:
 expected but was
.

  4) Failure:
test_standard_analyzer(StandardAnalyzerTest)
[./unit/../unit/index/../../unit/store/../../unit/analysis/tc_analyzer.rb:275]:
 expected but was
.

  5) Failure:
test_standard_tokenizer(StandardTokenizerTest)
[./unit/../unit/index/../../unit/
 expected but was
.

  6) Failure:
test_white_space_analyzer(WhiteSpaceAnalyzerTest)
[./unit/../unit/index/../../un
 expected but was
.

  7) Failure:
test_whitespace_tokenizer(WhiteSpaceTokenizerTest)
[./unit/../unit/index/../../u
 expected but was
.

162 tests, 12082 assertions, 7 failures, 0 errors
-- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Thu Mar 8 16:11:25 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 9 Mar 2007 08:11:25 +1100 Subject: [Ferret-talk] [ANN] Ferret 0.11.0-rc1 In-Reply-To: References: Message-ID: On 3/9/07, Bosko Milekic wrote: > I am stress testing and here's a problem I just hit: > > - got a Rails app using AAF, indexing on several fields of the > NamespaceMembership model > > - in one rails console, I do this: > (0..10000).each do |i| > vt = VehicleType.new(...) > vt.save! > nm = NamespaceMembership.new(...) > nm.save! # this causes indexing to get triggered due to AAF after_create hook > vt = vt.id > nm = nm.id > # this causes indexing to get triggered due to AAF after_update hook: > NamespaceMembership.destroy(nm) > VehicleType.destroy(vt) > end > > - Then while this runs I open up a browser and repeatedly hit up a > controller action with searches with a keyword that i know will match > the NamespaceMembership model I'm adding and removing in a tightloop > above. When I do this most of the time the search returns empty, but > I've got it to fail a few times; what follows is the stack trace: > > Ferret::StateError (State Error occured at :93 in xraise > Error occured in index.c:4121 - sr_get_lazy_doc > Document 1 has already been deleted > > stack trace > > The second time it failed with a similar trace, except instead of > "Document 1 has already been deleted" it said "Document 0 has already > been deleted". > > This looks like it's coming from ferret. Know what's up? Yep, the search finds the document and then the document is deleted by the time it is looked up. I can fix this. > This is > ferret 0.11.3 (latest as of this writing). I haven't tried multiple > simultaneous writers (e.g., two console processes adding and deleting > in a tightloop) yet, but I will next. Unfortunately, Ferret doesn't support two writers to the index so you are likely to get a lot of errors doing this. If you want multiple writers, use the acts_as_ferret DRb solution. > I've CC'd ferret-talk to see if Jens Kraemer has some advice as well. Jens, not to worry about this one. I'll get it working. -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Thu Mar 8 16:17:47 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 9 Mar 2007 08:17:47 +1100 Subject: [Ferret-talk] Test fail on debian 3.1 In-Reply-To: <39736ca639df5624bd5ca6e8b2622a92@ruby-forum.com> References: <39736ca639df5624bd5ca6e8b2622a92@ruby-forum.com> Message-ID: On 3/9/07, Linus wrote: > I have a problem, I use utf all over a rails site, but the search failes > to search characters with acccents... > > I try to debug it, and i had run unit tests for ferret, can those > failures cose problems? > >
> /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/test/ ruby test_all.rb
> Loading once
> Loaded suite test_all
> Started
> ................F.............................................FF...........................................................................................FF...FF
> Finished in 4.729505 seconds.
>
>   1) Failure:
> test_custom_filter(CustomAnalyzerTest)
> [./unit/../unit/index/../../unit/store/../../unit/analysis/tc_analyzer.rb:516]:
>  expected but was
> .
>
>   2) Failure:
> test_letter_analyzer(LetterAnalyzerTest)
> [./unit/../unit/index/../../unit/store/../../unit/analysis/tc_analyzer.rb:100]:
>            G?":55:62:1]> expected but was
>            G":55:60:1]>.
>
>   3) Failure:
> test_letter_tokenizer(LetterTokenizerTest)
> [./unit/../unit/index/../../unit/store/../../unit/analysis/tc_token_stream.rb:73]:
>            G?":55:62:1]> expected but was
>            G":55:60:1]>.
>
>   4) Failure:
> test_standard_analyzer(StandardAnalyzerTest)
> [./unit/../unit/index/../../unit/store/../../unit/analysis/tc_analyzer.rb:275]:
>  expected but was
> .
>
>   5) Failure:
> test_standard_tokenizer(StandardTokenizerTest)
> [./unit/../unit/index/../../unit/
>            G?":117:124:1]> expected but was
>            G":117:122:1]>.
>
>   6) Failure:
> test_white_space_analyzer(WhiteSpaceAnalyzerTest)
> [./unit/../unit/index/../../un
>  expected but was
>            G":55:60:1]>.
>
>   7) Failure:
> test_whitespace_tokenizer(WhiteSpaceTokenizerTest)
> [./unit/../unit/index/../../u
>  expected but was
>            G":55:60:1]>.
>
> 162 tests, 12082 assertions, 7 failures, 0 errors
> 
You need to have a UTF-8 locale installed or Ferret doesn't know how to deal with UTF-8 characters. Try typing locale at the command line to see what locale you have installed. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From blweiner at gmail.com Thu Mar 8 23:06:08 2007 From: blweiner at gmail.com (Ben) Date: Fri, 9 Mar 2007 05:06:08 +0100 Subject: [Ferret-talk] higlighting problem Message-ID: <3ba6234d1c29e0b97fbebe00d9a4d75a@ruby-forum.com> Hi, I've been having a problem getting highlighting to work with aaf. I have a class defined as follows such: class Link < ActiveRecord::Base acts_as_ferret :fields => { :description => { :store => :yes } } end I get back the correct results when I do Link.find_by_contents, however, I'd like to highlight them. If I do something like iterate through the list of results and call result.highlight("myquery", :field => :description), but this is returning nil for each result. How is this possible if these results are correctly returned because "myquery" is a token in their description? Am I incorrectly using the api? Thanks for the help, Ben -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Mar 9 04:38:21 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 9 Mar 2007 10:38:21 +0100 Subject: [Ferret-talk] higlighting problem In-Reply-To: <3ba6234d1c29e0b97fbebe00d9a4d75a@ruby-forum.com> References: <3ba6234d1c29e0b97fbebe00d9a4d75a@ruby-forum.com> Message-ID: <20070309093821.GA20174@cordoba.webit.de> On Fri, Mar 09, 2007 at 05:06:08AM +0100, Ben wrote: > Hi, > > I've been having a problem getting highlighting to work with aaf. > > I have a class defined as follows such: > > class Link < ActiveRecord::Base > acts_as_ferret :fields => { :description => { :store => :yes } } > end > > I get back the correct results when I do Link.find_by_contents, however, > I'd like to highlight them. > > If I do something like iterate through the list of results and call > result.highlight("myquery", :field => :description), but this is > returning nil for each result. How is this possible if these results are > correctly returned because "myquery" is a token in their description? Am > I incorrectly using the api? your usage of the api is perfectly right. I'm not sure what's going on there, and can't reproduce this here. To help debug this a bit, could you please try this in the console: results = Link.find_by_contents(query) result = results.first result.highlight(query, :field => :description) # returns nil doc_num = result.document_number # if you are on aaf trunk: Link.aaf_index.ferret_index.highlight(query, doc_num, :field => :description) # if on aaf stable: Link.ferret_index.highlight(query, doc_num, :field => :description) this would directly use ferret's highlight method. Btw, what version of aaf do you use? Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Fri Mar 9 05:04:20 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 9 Mar 2007 11:04:20 +0100 Subject: [Ferret-talk] bug or "feature"? In-Reply-To: References: <45EDC578.40708@benjaminkrause.com> <3448057cc7285a7842df9e190db69797@ruby-forum.com> <921e0f4ff2927766886b97f6a85876df@ruby-forum.com> <7e72ec656c3b196aa2f5ee2aa32bdfd2@ruby-forum.com> <2ea6fc3717b00e2e4934f54d9fe4469f@ruby-forum.com> <20070308090249.GP28036@cordoba.webit.de> Message-ID: <20070309100420.GB20174@cordoba.webit.de> On Thu, Mar 08, 2007 at 02:22:45PM +0100, mix wrote: > Jens Kraemer wrote: > > could you please have a look into your application's log file, and look > > for the query aaf actually runs against the ferret index? the relevant > > line should start with 'query: ' > > > > Jens > > > > > "Query: +all_categories:1 +bought_at_int:<1174655577 +gift:false" > > with this and a search '+' it found all books (it changes '+' with ' ') > same for '?' > > > > > now i've changed the query with: > > query = "+title:#{query} +all_categories:#{category_id} > +bought_at_int:<#{15.days.from_now.to_i} +gift:false #{extra}" > this works, now with '?' it doesn't found anything, but... if i search a > stop word (my, all, your, etc) it found all books :( > > the log: > > Query: +title:all +all_categories:1 +bought_at_int:<1174655577 > +gift:false I guess this is perfectly correct behaviour. The stop word can't be searched for, since it gets stripped from your query (and it has been stripped from all your indexed documents, too, so searching for it would be pointless). What remains is the query '+all_categories:1 +bought_at_int:<1174655577 +gift:false' which will then be run against the index. You can tell ferret to not use any stop words by specifying an empty list for the stop words: acts_as_ferret { :fields => { ... } }, { :analyzer => StandardAnalyzer.new([]) } Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From david.krmpotic at gmail.com Fri Mar 9 06:20:09 2007 From: david.krmpotic at gmail.com (D. Krmpotic) Date: Fri, 9 Mar 2007 12:20:09 +0100 Subject: [Ferret-talk] Newbie Message-ID: Hi! Suppose I have many articles in a database and I want to know how many times each search term appears in each one of them. Where do I start? Thank you very much! David -- Posted via http://www.ruby-forum.com/. From no at spam.thanks Fri Mar 9 07:09:03 2007 From: no at spam.thanks (david) Date: Fri, 9 Mar 2007 13:09:03 +0100 Subject: [Ferret-talk] bug or "feature"? In-Reply-To: <20070309100420.GB20174@cordoba.webit.de> References: <45EDC578.40708@benjaminkrause.com> <3448057cc7285a7842df9e190db69797@ruby-forum.com> <921e0f4ff2927766886b97f6a85876df@ruby-forum.com> <7e72ec656c3b196aa2f5ee2aa32bdfd2@ruby-forum.com> <2ea6fc3717b00e2e4934f54d9fe4469f@ruby-forum.com> <20070308090249.GP28036@cordoba.webit.de> <20070309100420.GB20174@cordoba.webit.de> Message-ID: <5ce3fa9af7871aae04c38332cf941e13@ruby-forum.com> Jens Kraemer wrote: > I guess this is perfectly correct behaviour. The stop word can't be > searched for, since it gets stripped from your query (and it has been > stripped from all your indexed documents, too, so searching for it would > be pointless). What remains is the query > '+all_categories:1 +bought_at_int:<1174655577 +gift:false' which will > then be run against the index. > > You can tell ferret to not use any stop words by specifying an empty > list for the stop words: > > acts_as_ferret { :fields => { ... } }, { :analyzer => > StandardAnalyzer.new([]) } > ok, thanks, i'll try :) -- Posted via http://www.ruby-forum.com/. From john at johnleach.co.uk Fri Mar 9 09:17:53 2007 From: john at johnleach.co.uk (John Leach) Date: Fri, 09 Mar 2007 14:17:53 +0000 Subject: [Ferret-talk] script to reproduce segfaults in 0.11.3 Message-ID: <1173449873.22147.6.camel@localhost.localdomain> Hi, I've adapted my ferret_killer1.rb script to reproduce the segfaults I'm still having in my app with Ferret 0.11.3. ferret_killer1 didn't reproduce these - I had to add code to retrieve the documents from the index after searching (and fixed the index creation bug Dave spotted). http://johnleach.co.uk/downloads/ruby/ferret/ferret_killer3.rb To use, in one terminal run: ./ferret_killer3.rb index In another terminal: ./ferret_killer3.rb search the search process will segfault after a couple of searches or so: ferret_killer3.rb:64: [BUG] Segmentation fault ruby 1.8.5 (2006-08-25) [i486-linux] Hope this helps, John. -- http://johnleach.co.uk From dbalmain.ml at gmail.com Fri Mar 9 10:37:32 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 10 Mar 2007 02:37:32 +1100 Subject: [Ferret-talk] script to reproduce segfaults in 0.11.3 In-Reply-To: <1173449873.22147.6.camel@localhost.localdomain> References: <1173449873.22147.6.camel@localhost.localdomain> Message-ID: On 3/10/07, John Leach wrote: > Hi, > > I've adapted my ferret_killer1.rb script to reproduce the segfaults I'm > still having in my app with Ferret 0.11.3. > > ferret_killer1 didn't reproduce these - I had to add code to retrieve > the documents from the index after searching (and fixed the index > creation bug Dave spotted). > > http://johnleach.co.uk/downloads/ruby/ferret/ferret_killer3.rb Hi John, Thanks for the bug report. This has been fixed in trunk. I'll try and get a new release out soon. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Fri Mar 9 11:01:06 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 10 Mar 2007 03:01:06 +1100 Subject: [Ferret-talk] Newbie In-Reply-To: References: Message-ID: On 3/9/07, D. Krmpotic wrote: > Hi! > > > Suppose I have many articles in a database and I want to know how many > times each search term appears in each one of them. Where do I start? Here is an example; require 'rubygems' require 'ferret' index = Ferret::I.new index << {:content => 'one two three'} index << {:content => 'no 3 here'} index << {:content => 'three three three'} index.reader.term_docs_for(:content, 'three').each do |doc, freq| puts "three appeared #{freq} times in document #{doc}" end -- Dave Balmain http://www.davebalmain.com/ From caleb at inforadical.net Fri Mar 9 14:22:14 2007 From: caleb at inforadical.net (Caleb Clausen) Date: Fri, 09 Mar 2007 11:22:14 -0800 Subject: [Ferret-talk] memory leak in index build? Message-ID: <45F1B3E6.7020305@inforadical.net> I have a script (below) which attempts to make an index out of all the man pages on my system. It takes a while, mostly because it runs man over and over... but anyway, as time goes on the memory usage goes up and up and never down. Eventually, it runs out of ram and just starts thrashing up the swap space, pretty much grinding to a halt. The workaround would seem to be to index documents in batches in the background, shutting down the index process every so often to recover its memory. I'm about to try that, because I'm really hunting a different bug... however, the memory problem concerns me. require 'rubygems' require 'ferret' require 'set' dir = "temp_index" if ARGV.first=="-p" ARGV.shift prefix=ARGV.shift end fi= Ferret::Index::FieldInfos.new fi.add_field :name, :index => :yes, :store => :yes, :term_vector => :with_positions %w[data field1 field2 field3].each{|fieldname| fi.add_field fieldname.to_sym, :index => :yes, :store => :no, :term_vector => :with_positions } i = Ferret::Index::IndexWriter.new(:path=>dir, :create=>true, :field_infos=>fi) list=Dir["/usr/share/man/*/#{prefix}*.gz"] numpages=(ARGV.last||list.size).to_i list[0...numpages].each{|manfile| all,name,section=*/\A(.*)\.([^.]+)\Z/.match(File.basename(manfile, ".gz")) tttt=`man #{section} #{name}`.gsub(/.\b/m, '') i << { :data=>tttt.to_s, :name=>name, :field1=>name, :field2=>name, :field3=>name, } } i.close i=Ferret::Index::IndexReader.new dir i.max_doc.times{|n| i.term_vector(n,:data).terms \ .inject(0){|sum,tvt| tvt.positions.size } > 1_000_000 and puts "heinous term count for #{i[n][:name]}" } seenterms=Set[] begin i.terms(:data).each{|term,df| seenterms.include? term and next i.term_docs_for(:data,term) seenterms << term } rescue Exception raise end From dbalmain.ml at gmail.com Fri Mar 9 18:09:11 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 10 Mar 2007 10:09:11 +1100 Subject: [Ferret-talk] memory leak in index build? In-Reply-To: <45F1B3E6.7020305@inforadical.net> References: <45F1B3E6.7020305@inforadical.net> Message-ID: On 3/10/07, Caleb Clausen wrote: > I have a script (below) which attempts to make an index out of all the > man pages on my system. It takes a while, mostly because it runs man > over and over... but anyway, as time goes on the memory usage goes up > and up and never down. Eventually, it runs out of ram and just starts > thrashing up the swap space, pretty much grinding to a halt. Hey Caleb, Running your test for 15 minutes my memory usage climbed to 30Mb. It was still slowly climbing which is not a good sign but not enough to bring my system to a halt. Anyway, I tried using valgrind's memcheck on it and I couldn't find a leak in the Ferret code. Perhaps it is a leak in your version of Ruby, although I doubt it. Here is the most significant output from valgrind with --show-reachable=yes set; ==7636== 110,880 bytes in 6,930 blocks are still reachable in loss record 15 of 20 ==7636== at 0x4020396: malloc (vg_replace_malloc.c:149) ==7636== by 0x40C175F: st_insert (st.c:288) ==7636== by 0x40D1E55: rb_ivar_set (variable.c:1056) ==7636== by 0x40D1FC2: rb_iv_set (variable.c:1959) ==7636== by 0x40D2003: rb_name_class (variable.c:282) ==7636== by 0x408BCBB: boot_defclass (object.c:2462) ==7636== by 0x408D020: Init_Object (object.c:2549) ==7636== by 0x40798A0: rb_call_inits (inits.c:54) ==7636== by 0x4061E5C: ruby_init (eval.c:1382) ==7636== by 0x8048600: main (in /usr/bin/ruby1.8) ==7636== ==7636== ==7636== 187,248 bytes in 11,703 blocks are still reachable in loss record 16 of 20 ==7636== at 0x4020396: malloc (vg_replace_malloc.c:149) ==7636== by 0x40C184F: st_init_table_with_size (st.c:154) ==7636== by 0x40C18B6: st_init_strtable_with_size (st.c:193) ==7636== by 0x4095FBD: Init_sym (parse.y:5885) ==7636== by 0x4079896: rb_call_inits (inits.c:52) ==7636== by 0x4061E5C: ruby_init (eval.c:1382) ==7636== by 0x8048600: main (in /usr/bin/ruby1.8) ==7636== ==7636== ==7636== 514,228 bytes in 11,687 blocks are still reachable in loss record 17 of 20 ==7636== at 0x401F6D5: calloc (vg_replace_malloc.c:279) ==7636== by 0x40C1870: st_init_table_with_size (st.c:158) ==7636== by 0x40C1914: st_init_table (st.c:167) ==7636== by 0x40C196F: st_init_numtable (st.c:173) ==7636== by 0x40CFEB6: Init_var_tables (variable.c:28) ==7636== by 0x407989B: rb_call_inits (inits.c:53) ==7636== by 0x4061E5C: ruby_init (eval.c:1382) ==7636== by 0x8048600: main (in /usr/bin/ruby1.8) ==7636== ==7636== ==7636== 965,584 bytes in 60,349 blocks are still reachable in loss record 18 of 20 ==7636== at 0x4020396: malloc (vg_replace_malloc.c:149) ==7636== by 0x40C1692: st_add_direct (st.c:307) ==7636== by 0x4095D1A: rb_intern (parse.y:6067) ==7636== by 0x40CFED7: Init_var_tables (variable.c:30) ==7636== by 0x407989B: rb_call_inits (inits.c:53) ==7636== by 0x4061E5C: ruby_init (eval.c:1382) ==7636== by 0x8048600: main (in /usr/bin/ruby1.8) ==7636== ==7636== ==7636== 1,088,800 bytes in 50,609 blocks are still reachable in loss record 19 of 20 ==7636== at 0x4020396: malloc (vg_replace_malloc.c:149) ==7636== by 0x4074E50: ruby_xmalloc (gc.c:121) ==7636== by 0x40CF72F: ruby_strdup (util.c:634) ==7636== by 0x4095CFF: rb_intern (parse.y:6066) ==7636== by 0x40CFED7: Init_var_tables (variable.c:30) ==7636== by 0x407989B: rb_call_inits (inits.c:53) ==7636== by 0x4061E5C: ruby_init (eval.c:1382) ==7636== by 0x8048600: main (in /usr/bin/ruby1.8) ==7636== ==7636== ==7636== 2,374,520 bytes in 4 blocks are still reachable in loss record 20 of 20 ==7636== at 0x4020396: malloc (vg_replace_malloc.c:149) ==7636== by 0x40737F9: add_heap (gc.c:351) ==7636== by 0x4061D74: ruby_init (eval.c:1372) ==7636== by 0x8048600: main (in /usr/bin/ruby1.8) As you can see, non of this has anything to do with Ferret. If you haven't used valgrind before and you want to try it there, here is how; valgrind --leak-check=yes ruby calebs_test.rb 2> res You'll probably want to capture the output (like I have here) as it is *very* long for ruby scripts. Lots of warnings from the ruby internals. Let me know if you try this and you find anything unusual. Incidentally, I'm not sure what the other bug you are chasing is but it may have something to do with the encoding of the man pages. I don't think they are UTF-8 so if your locale is set to UTF-8 it will cause some problems in the analysis. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From lajanus at o2.pl Sat Mar 10 14:12:17 2007 From: lajanus at o2.pl (Linus) Date: Sat, 10 Mar 2007 20:12:17 +0100 Subject: [Ferret-talk] Test fail on debian 3.1 In-Reply-To: References: <39736ca639df5624bd5ca6e8b2622a92@ruby-forum.com> Message-ID: David Balmain wrote: > You need to have a UTF-8 locale installed or Ferret doesn't know how > to deal with UTF-8 characters. Try typing locale at the command line > to see what locale you have installed. > That was it, thanks! -- Posted via http://www.ruby-forum.com/. From david.krmpotic at gmail.com Sun Mar 11 12:22:44 2007 From: david.krmpotic at gmail.com (D. Krmpotic) Date: Sun, 11 Mar 2007 17:22:44 +0100 Subject: [Ferret-talk] Newbie In-Reply-To: References: Message-ID: Thank you David! that will get me started.. -- Posted via http://www.ruby-forum.com/. From xbelanch at gmail.com Mon Mar 12 12:02:53 2007 From: xbelanch at gmail.com (Xavier Belanche) Date: Mon, 12 Mar 2007 17:02:53 +0100 Subject: [Ferret-talk] index.rb:384 [BUG] Message-ID: <9f7eac58b1f5966123b9aa1366264e0a@ruby-forum.com> Hi folks, I've working and playing with acts_as_ferret and follow this fantastic tutorial: http://www.railsenvy.com/2007/2/19/acts-as-ferret-tutorial When I try to implement the field storage tip, it crash. So, I try to make it via script/console: 1. I have a simple model called Articles: class Article < ActiveRecord::Base acts_as_ferret :fields => ['title'] end 2. In the console, I try this: >> index = Article.ferret_index => #, @mon_entering_queue=[], @default_field=["title"], @key=:id, @mon_count=0, @auto_flush=true, @open=true, @close_dir=true, @id_field=:id, @mon_owner=nil, @reader=nil, @searcher=nil, @options={:lock_retry_time=>2, :path=>"script/../config/../config/../index/development/article", :create_if_missing=>true, :default_field=>["title"], :analyzer=>#, :auto_flush=>true, :or_default=>false, :dir=>#, :key=>:id, :handle_parse_errors=>true}, @mon_waiting_queue=[]> (it's ok, seems to run ok! :) >> query = "ruby" => "ruby" >> options ="" (mmmh, just enough! Now.... ) >> index.search_each(query, options) do |doc, score| ?> puts index[doc][:title] >> end And the next and horrible message! /usr/local/lib/site_ruby/1.8/ferret/index.rb:384: [BUG] Segmentation fault ruby 1.8.4 (2005-12-24) [x86_64-linux] Woops! Someone has the same bug!???? For more tech information: Rails 1.2.2 Gem 0.9.2 acts_as_ferret (0.3.1) ferret (0.11.3) Thanks everyone! -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Mon Mar 12 18:54:47 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Mon, 12 Mar 2007 18:54:47 -0400 Subject: [Ferret-talk] Too many open files error Message-ID: Hi Dave, i just stumbled across a new error i haven't seen before :) caught error inside loop: IO Error occured at :93 in xraise Error occured in fs_store.c:264 - fs_new_output couldn't create OutStream /var/www/localhost/rails/current/ script/backgroundrb/../../config/../db/ferret.index.production/ _jei_0.f0: my ulimit is set to 1024 files, the error occurs regularly.. any idea? Ben From bk at benjaminkrause.com Mon Mar 12 19:03:08 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Mon, 12 Mar 2007 19:03:08 -0400 Subject: [Ferret-talk] Too many open files error In-Reply-To: References: Message-ID: <76A8016E-A481-476A-9B7A-ABA8CBAB5BB6@benjaminkrause.com> On Mar 12, 2007, at 18:54, Benjamin Krause wrote: > Hi Dave, > > i just stumbled across a new error i haven't seen before :) > > caught error inside loop: IO Error occured at :93 in xraise > Error occured in fs_store.c:264 - fs_new_output > couldn't create OutStream /var/www/localhost/rails/current/ > script/backgroundrb/../../config/../db/ferret.index.production/ > _jei_0.f0: > > my ulimit is set to 1024 files, the error occurs regularly.. any idea? ah, and i should add, this is from our backgroundrb indexing process there is basically nothing else than opening and closing the index, all of the time. this isn't necessarily a ferret problem, but the process isn't doing much else. ben From caleb at inforadical.net Mon Mar 12 20:17:13 2007 From: caleb at inforadical.net (Caleb Clausen) Date: Mon, 12 Mar 2007 17:17:13 -0700 Subject: [Ferret-talk] memory leak in index build? In-Reply-To: References: Message-ID: <45F5ED89.5010004@inforadical.net> Dave Balmain said: > Running your test for 15 minutes my memory usage climbed to 30Mb. It > was still slowly climbing which is not a good sign but not enough to > bring my system to a halt. Anyway, I tried using valgrind's memcheck > on it and I couldn't find a leak in the Ferret code. Perhaps it is a > leak in your version of Ruby, although I doubt it. Here is the most > significant output from valgrind with --show-reachable=yes set; Ok, so my ruby is version 1.8.2, kinda old, so maybe there is an old bug in it. Recent experiments on another machine (running a newer ruby, 1.8.5, I think) didn't seem to have the same memory leak. What version do you run, by the way? > Incidentally, I'm not sure what the other bug you are chasing is but > it may have something to do with the encoding of the man pages. I I know the man output is some encoding I don't understand; I'm just trying to generate a lot of data to feed into ferret. I don't care if it's correct. I'm still having quite a few crashes with ferret, though the situation has improved. I'm trying to reproduce those without handing you my entire codebase. So far, without success. :( > don't think they are UTF-8 so if your locale is set to UTF-8 it will > cause some problems in the analysis. I know I'm not on the UTF-8 locale. Actually, I've been trying to figure out how to set my locale to UTF-8. I don't suppose you'd know? I'm using Debian stable. From julioody at gmail.com Tue Mar 13 01:07:01 2007 From: julioody at gmail.com (Julio Cesar Ody) Date: Tue, 13 Mar 2007 16:07:01 +1100 Subject: [Ferret-talk] index returns all results for specific queries Message-ID: Hey all, I'm getting some really weird results when searching documents. It *seems* to be somehow related to the document format I'm using. I wrote a small script to replicate it: ################ #!/usr/bin/ruby require 'rubygems' require 'ferret' include Ferret index = Index::Index.new(:path => '/tmp/fooindex', :key => :id) # dummy data index << {:visibility=>"private", :type=>"media", :title=>"example title", :owner=>"user/3003", :author=>"user/3003", :description=>"description example", :id=>"user/3003/media/1"} index << {:visibility=>"private", :type=>"media", :title=>"a new title", :owner=>"user/3003", :author=>"user/3003", :description=>"more foo desc", :id=>"user/3003/media/2"} index << {:visibility=>"private", :type=>"media", :title=>"random title", :owner=>"user/3003", :author=>"user/3003", :description=>"random description", :id=>"user/3003/media/4"} index << {:visibility=>"private", :type=>"media", :title=>"random title", :owner=>"user/3003", :author=>"user/3003", :description=>"random description", :id=>"user/3003/media/5"} index.search_each(ARGV.shift) { |doc, score| puts index[doc].load.inspect } ################ The following queries are returning *all* the results currently in the index: $ ruby script.rb "title:me" {:author=>"user/3003", :description=>"description example", :visibility=>"private", :id=>"user/3003/media/1", :title=>"example title", :type=>"media", :owner=>"user/3003"} ... (remaining results) $ ruby script.rb "title:my" (same as above) And weird enough, the following $ ruby script.rb "title:mo" Won't return anything. There's more variants to that, but I think you get my meaning. The following works properly: $ ruby script.rb "title:random" (returns the two results that contain "random" in the title, which is what is supposed to be) Is there something I'm missing? It doesn't seem to make sense to me that those queries above should return all the results in the index, specially considering they don't actually match anything. Any help is appreciated. Thanks. -- Julio C. Ody From dbalmain.ml at gmail.com Tue Mar 13 02:20:51 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 13 Mar 2007 17:20:51 +1100 Subject: [Ferret-talk] memory leak in index build? In-Reply-To: <45F5ED89.5010004@inforadical.net> References: <45F5ED89.5010004@inforadical.net> Message-ID: On 3/13/07, Caleb Clausen wrote: > Dave Balmain said: > > Running your test for 15 minutes my memory usage climbed to 30Mb. It > > was still slowly climbing which is not a good sign but not enough to > > bring my system to a halt. Anyway, I tried using valgrind's memcheck > > on it and I couldn't find a leak in the Ferret code. Perhaps it is a > > leak in your version of Ruby, although I doubt it. Here is the most > > significant output from valgrind with --show-reachable=yes set; > > Ok, so my ruby is version 1.8.2, kinda old, so maybe there is an old bug > in it. Recent experiments on another machine (running a newer ruby, > 1.8.5, I think) didn't seem to have the same memory leak. > > What version do you run, by the way? I'm on 1.8.5. > > Incidentally, I'm not sure what the other bug you are chasing is but > > it may have something to do with the encoding of the man pages. I > > I know the man output is some encoding I don't understand; I'm just > trying to generate a lot of data to feed into ferret. I don't care if > it's correct. I'm still having quite a few crashes with ferret, though > the situation has improved. I'm trying to reproduce those without > handing you my entire codebase. So far, without success. :( Let me know when you do find the problem. It is possible that is has something to do with a mismatch of encodings. Feeding ISO-8859-1 data (which is what my man pages are encoded in) to a UTF-8 analyzer might cause Ferret to crash. I've tried to fix this so that it doesn't happen but I might have missed something. > > don't think they are UTF-8 so if your locale is set to UTF-8 it will > > cause some problems in the analysis. > > I know I'm not on the UTF-8 locale. Actually, I've been trying to figure > out how to set my locale to UTF-8. I don't suppose you'd know? I'm using > Debian stable. It's not too hard. Something like; $ sudo apt-get install debconf $ sudo dpkg-reconfigure locales Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Tue Mar 13 02:38:45 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 13 Mar 2007 17:38:45 +1100 Subject: [Ferret-talk] index returns all results for specific queries In-Reply-To: References: Message-ID: On 3/13/07, Julio Cesar Ody wrote: > Hey all, > > I'm getting some really weird results when searching documents. It > *seems* to be somehow related to the document format I'm using. > > I wrote a small script to replicate it: > > ################ > #!/usr/bin/ruby > > require 'rubygems' > require 'ferret' > include Ferret > index = Index::Index.new(:path => '/tmp/fooindex', :key => :id) > > # dummy data > index << {:visibility=>"private", :type=>"media", :title=>"example > title", :owner=>"user/3003", :author=>"user/3003", > :description=>"description example", :id=>"user/3003/media/1"} > index << {:visibility=>"private", :type=>"media", :title=>"a new > title", :owner=>"user/3003", :author=>"user/3003", :description=>"more > foo desc", :id=>"user/3003/media/2"} > index << {:visibility=>"private", :type=>"media", :title=>"random > title", :owner=>"user/3003", :author=>"user/3003", > :description=>"random description", :id=>"user/3003/media/4"} > index << {:visibility=>"private", :type=>"media", :title=>"random > title", :owner=>"user/3003", :author=>"user/3003", > :description=>"random description", :id=>"user/3003/media/5"} > > index.search_each(ARGV.shift) { |doc, score| > puts index[doc].load.inspect > } > ################ Thanks for including the script. It makes my job much easier. :) > The following queries are returning *all* the results currently in the index: > > $ ruby script.rb "title:me" > {:author=>"user/3003", :description=>"description example", > :visibility=>"private", :id=>"user/3003/media/1", :title=>"example > title", :type=>"media", :owner=>"user/3003"} > ... (remaining results) > $ ruby script.rb "title:my" > (same as above) > > And weird enough, the following > > $ ruby script.rb "title:mo" > > Won't return anything. There's more variants to that, but I think you > get my meaning. The problem is that 'me' and 'my' are stop words. When they get removed the query becomes 'title:' which is invalid. By default Ferret catches query parse exceptions and attempts to parse the query as a simple boolean term query, removing all special characters, so this query then becomes 'title'. Since title can be found in the title field for all documents, all documents are returned. So I don't think this is a bug but it is definitely undesired behaviour. I'll try and think of a better way to parse this. In the mean time, you may want to think about changing the stopword list or removing stopwords all together to prevent this problem from occurring. -- Dave Balmain http://www.davebalmain.com/ From kraemer at webit.de Tue Mar 13 04:40:14 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 13 Mar 2007 09:40:14 +0100 Subject: [Ferret-talk] index.rb:384 [BUG] In-Reply-To: <9f7eac58b1f5966123b9aa1366264e0a@ruby-forum.com> References: <9f7eac58b1f5966123b9aa1366264e0a@ruby-forum.com> Message-ID: <20070313084014.GD1352@cordoba.webit.de> On Mon, Mar 12, 2007 at 05:02:53PM +0100, Xavier Belanche wrote: > Hi folks, > I've working and playing with acts_as_ferret and follow this fantastic > tutorial: > http://www.railsenvy.com/2007/2/19/acts-as-ferret-tutorial > > When I try to implement the field storage tip, it crash. So, I try to > make it via script/console: > > 1. I have a simple model called Articles: > class Article < ActiveRecord::Base > acts_as_ferret :fields => ['title'] > end you'll have to tell aaf to store field values in the index: acts_as_ferret :fields => { :title => { :store => :yes } } Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From dbalmain.ml at gmail.com Tue Mar 13 06:27:32 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 13 Mar 2007 21:27:32 +1100 Subject: [Ferret-talk] Too many open files error In-Reply-To: <76A8016E-A481-476A-9B7A-ABA8CBAB5BB6@benjaminkrause.com> References: <76A8016E-A481-476A-9B7A-ABA8CBAB5BB6@benjaminkrause.com> Message-ID: On 3/13/07, Benjamin Krause wrote: > > On Mar 12, 2007, at 18:54, Benjamin Krause wrote: > > > Hi Dave, > > > > i just stumbled across a new error i haven't seen before :) > > > > caught error inside loop: IO Error occured at :93 in xraise > > Error occured in fs_store.c:264 - fs_new_output > > couldn't create OutStream /var/www/localhost/rails/current/ > > script/backgroundrb/../../config/../db/ferret.index.production/ > > _jei_0.f0: > > > > my ulimit is set to 1024 files, the error occurs regularly.. any idea? > > ah, and i should add, this is from our backgroundrb indexing process > there is basically nothing else than opening and closing the index, all > of the time. this isn't necessarily a ferret problem, but the process > isn't doing much else. This is not a bug but rather a limitation of the operating system. There are a few solutions. If you are getting this problem you should definitely be sure to set :use_compound_file to true (which is the default setting). You might also like to lower the merge_factor which defaults to 10. Having a lower merge_factor will slow indexing a little but it will actually make search faster. Try setting it to 4. Alternatively, in your situation, I would probably just increase the file handle limit. I'm sure you'd have enough memory to do that. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Tue Mar 13 06:59:36 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 13 Mar 2007 21:59:36 +1100 Subject: [Ferret-talk] index.rb:384 [BUG] In-Reply-To: <9f7eac58b1f5966123b9aa1366264e0a@ruby-forum.com> References: <9f7eac58b1f5966123b9aa1366264e0a@ruby-forum.com> Message-ID: On 3/13/07, Xavier Belanche wrote: > > >> options ="" > > (mmmh, just enough! Now.... ) > >> index.search_each(query, options) do |doc, score| > ?> puts index[doc][:title] > >> end > > And the next and horrible message! > > /usr/local/lib/site_ruby/1.8/ferret/index.rb:384: [BUG] Segmentation > fault > ruby 1.8.4 (2005-12-24) [x86_64-linux] options must be a Hash or nil, not a String. One of the things I still need to do with Ferret is add argument checking. At the moment, if you pass a string when a hash is expected you'll get a segfault or bus error like this. I do plan to fix this in future. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From jw at innerewut.de Tue Mar 13 07:59:24 2007 From: jw at innerewut.de (Jonathan Weiss) Date: Tue, 13 Mar 2007 12:59:24 +0100 Subject: [Ferret-talk] memory leak in index build? In-Reply-To: References: <45F5ED89.5010004@inforadical.net> Message-ID: <45F6921C.6030304@innerewut.de> > > It's not too hard. Something like; > > $ sudo apt-get install debconf > $ sudo dpkg-reconfigure locales On the notion of the locale stuff, would it be possible to create a configuration option that explicitly sets Ferret to UTF-8 mode? I think that a lot of people have been bitten by this and an explicit configuration option IMHO make a lot of sense. With acts_as_ferret it would look maybe like this class A < ActiveRecrod::Base acts_as_ferret :encoding => 'utf8' end > > Cheers, > Dave > Regards, Jonathan -- Jonathan Weiss http://blog.innerewut.de From jeroen at laika.nl Tue Mar 13 12:20:51 2007 From: jeroen at laika.nl (jeroen janssen) Date: Tue, 13 Mar 2007 17:20:51 +0100 Subject: [Ferret-talk] Acts_as_ferret and auto-flush Message-ID: <623EA2CB-0687-4828-8FB9-6AC8BA35C76E@laika.nl> Hi, I'm using acts_as_ferret in with a mongrel and I' m getting locking errors that after a while result in a corrupt database. I know about the problem with different processes writing to the index but I haven't been able to get the DRB server working properly yet. I read on this list that another solution is to set :auto_flush to true but I'm not sure how to do this. As I understand it I have to do this for the Index class and not for the model that acts_as_ferret, right? How exactly do I do this? Do I just have to make a new Index model? I hope someone can help me out. -- Jeroen Janssen From xbelanch at gmail.com Tue Mar 13 14:44:31 2007 From: xbelanch at gmail.com (Xavier Belanche) Date: Tue, 13 Mar 2007 19:44:31 +0100 Subject: [Ferret-talk] index.rb:384 [BUG] In-Reply-To: References: <9f7eac58b1f5966123b9aa1366264e0a@ruby-forum.com> Message-ID: <1123b0ee30e7d6424eec68c34386e441@ruby-forum.com> David Balmain wrote: > On 3/13/07, Xavier Belanche wrote: >> /usr/local/lib/site_ruby/1.8/ferret/index.rb:384: [BUG] Segmentation >> fault >> ruby 1.8.4 (2005-12-24) [x86_64-linux] > > options must be a Hash or nil, not a String. One of the things I still > need to do with Ferret is add argument checking. At the moment, if you > pass a string when a hash is expected you'll get a segfault or bus > error like this. I do plan to fix this in future. > > Cheers, > Dave Thanks David. It's that you say: if I pass options like a Hash or nil not appears a bus error. In other way, I dont undertand why is the reason I always recieve a nil result when I use this method in the same way: $ ./script/console Loading development environment. >> article = Article.find 1 => #"Ruby on Rails", "id"=>"1", "content"=>"Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Nullam tempor risus et ante. Maecenas consectetuer feugiat orci. Fusce vehicula velit id odio. Phasellus ut mauris. Aenean faucibus dolor quis nibh. Praesent convallis est id ante. In felis. "}> Ok! I put in the Article model those lines :) class Article < ActiveRecord::Base acts_as_ferret :fields => { :title=> {:store=> :yes} } end I return to console and I put the follow lines: >> index = Article.ferret_index => #, @mon_entering_queue=[], @default_field=[:title], @key=:id, @mon_count=0, @auto_flush=true, @open=true, @close_dir=true, @id_field=:id, @mon_owner=nil, @reader=nil, @searcher=nil, @options={:lock_retry_time=>2, :path=>"script/../config/../config/../index/development/article", :create_if_missing=>true, :default_field=>[:title], :analyzer=>#, :auto_flush=>true, :or_default=>false, :dir=>#, :key=>:id, :handle_parse_errors=>true}, @mon_waiting_queue=[]> Next, I try to evaluate the search_each method >> index.search_each("Ruby",{}) do |doc, score| ?> puts index[doc][:title] >> puts doc,score >> end nil 1 0.764464735984802 Great! I recieve a score value (value), the correct id value (doc), but it's not possible to acquire the :title value with the "index[doc][:title]" expression... :( it's always returns the nil value :( So, any way to achieve it? Thanks for all! -- Posted via http://www.ruby-forum.com/. From julioody at gmail.com Tue Mar 13 18:29:59 2007 From: julioody at gmail.com (Julio Cesar Ody) Date: Wed, 14 Mar 2007 09:29:59 +1100 Subject: [Ferret-talk] index returns all results for specific queries In-Reply-To: References: Message-ID: Thanks David, I instanced a StandardAnalyzer and passed an empty array for stop words, and it did the trick. If anyone wants to comment on what I'm losing by doing this, It would be really nice. On 3/13/07, David Balmain wrote: > On 3/13/07, Julio Cesar Ody wrote: > > Hey all, > > > > I'm getting some really weird results when searching documents. It > > *seems* to be somehow related to the document format I'm using. > > > > I wrote a small script to replicate it: > > > > ################ > > #!/usr/bin/ruby > > > > require 'rubygems' > > require 'ferret' > > include Ferret > > index = Index::Index.new(:path => '/tmp/fooindex', :key => :id) > > > > # dummy data > > index << {:visibility=>"private", :type=>"media", :title=>"example > > title", :owner=>"user/3003", :author=>"user/3003", > > :description=>"description example", :id=>"user/3003/media/1"} > > index << {:visibility=>"private", :type=>"media", :title=>"a new > > title", :owner=>"user/3003", :author=>"user/3003", :description=>"more > > foo desc", :id=>"user/3003/media/2"} > > index << {:visibility=>"private", :type=>"media", :title=>"random > > title", :owner=>"user/3003", :author=>"user/3003", > > :description=>"random description", :id=>"user/3003/media/4"} > > index << {:visibility=>"private", :type=>"media", :title=>"random > > title", :owner=>"user/3003", :author=>"user/3003", > > :description=>"random description", :id=>"user/3003/media/5"} > > > > index.search_each(ARGV.shift) { |doc, score| > > puts index[doc].load.inspect > > } > > ################ > > Thanks for including the script. It makes my job much easier. :) > > > The following queries are returning *all* the results currently in the index: > > > > $ ruby script.rb "title:me" > > {:author=>"user/3003", :description=>"description example", > > :visibility=>"private", :id=>"user/3003/media/1", :title=>"example > > title", :type=>"media", :owner=>"user/3003"} > > ... (remaining results) > > $ ruby script.rb "title:my" > > (same as above) > > > > And weird enough, the following > > > > $ ruby script.rb "title:mo" > > > > Won't return anything. There's more variants to that, but I think you > > get my meaning. > > The problem is that 'me' and 'my' are stop words. When they get > removed the query becomes 'title:' which is invalid. By default Ferret > catches query parse exceptions and attempts to parse the query as a > simple boolean term query, removing all special characters, so this > query then becomes 'title'. Since title can be found in the title > field for all documents, all documents are returned. So I don't think > this is a bug but it is definitely undesired behaviour. I'll try and > think of a better way to parse this. > > In the mean time, you may want to think about changing the stopword > list or removing stopwords all together to prevent this problem from > occurring. > > -- > Dave Balmain > http://www.davebalmain.com/ > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Julio C. Ody http://rootshell.be/~julioody From dbalmain.ml at gmail.com Tue Mar 13 21:57:00 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 14 Mar 2007 12:57:00 +1100 Subject: [Ferret-talk] index.rb:384 [BUG] In-Reply-To: <1123b0ee30e7d6424eec68c34386e441@ruby-forum.com> References: <9f7eac58b1f5966123b9aa1366264e0a@ruby-forum.com> <1123b0ee30e7d6424eec68c34386e441@ruby-forum.com> Message-ID: On 3/14/07, Xavier Belanche wrote: > > > In other way, I dont undertand why is the reason I always > recieve a nil result when I use this method in the same way: > > $ ./script/console > Loading development environment. > >> article = Article.find 1 > => #"Ruby on Rails", > "id"=>"1", "content"=>"Lorem ipsum dolor sit amet, consectetuer > adipiscing elit. Nullam tempor risus et ante. Maecenas consectetuer > feugiat orci. Fusce vehicula velit id odio. Phasellus ut mauris. Aenean > faucibus dolor quis nibh. Praesent convallis est id ante. In felis. "}> > > Ok! I put in the Article model those lines :) > > class Article < ActiveRecord::Base > acts_as_ferret :fields => { > :title=> {:store=> :yes} > } > end > > I return to console and I put the follow lines: > > >> index = Article.ferret_index > => # @default_input_field=:id, @qp=nil, > @dir=#, > @mon_entering_queue=[], @default_field=[:title], @key=:id, @mon_count=0, > @auto_flush=true, @open=true, @close_dir=true, @id_field=:id, > @mon_owner=nil, @reader=nil, @searcher=nil, > @options={:lock_retry_time=>2, > :path=>"script/../config/../config/../index/development/article", > :create_if_missing=>true, :default_field=>[:title], > :analyzer=>#, > :auto_flush=>true, :or_default=>false, > :dir=>#, :key=>:id, > :handle_parse_errors=>true}, @mon_waiting_queue=[]> > > Next, I try to evaluate the search_each method > > >> index.search_each("Ruby",{}) do |doc, score| > ?> puts index[doc][:title] > >> puts doc,score > >> end > nil > 1 > 0.764464735984802 > > Great! I recieve a score value (value), the correct id value (doc), but > it's not possible to acquire the :title value with the > "index[doc][:title]" expression... :( it's always returns the nil value > :( > So, any way to achieve it? > > Thanks for all! Did you rebuild your index when you changed the title to a stored field? I don't think this will happen automatically. -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Tue Mar 13 22:07:35 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 14 Mar 2007 13:07:35 +1100 Subject: [Ferret-talk] memory leak in index build? In-Reply-To: <45F6921C.6030304@innerewut.de> References: <45F5ED89.5010004@inforadical.net> <45F6921C.6030304@innerewut.de> Message-ID: On 3/13/07, Jonathan Weiss wrote: > > > > > It's not too hard. Something like; > > > > $ sudo apt-get install debconf > > $ sudo dpkg-reconfigure locales > > > On the notion of the locale stuff, would it be possible to create a > configuration option that explicitly sets Ferret to UTF-8 mode? > > I think that a lot of people have been bitten by this and an explicit > configuration option IMHO make a lot of sense. With acts_as_ferret it > would look maybe like this > > > class A < ActiveRecrod::Base > acts_as_ferret :encoding => 'utf8' > end The problem is that this may give people the false impression that Ferret will handle UTF-8 even when they don't have a UTF-8 locale installed. For example, adding this configuration option wouldn't have helped Caleb. I guess one possibility would be to raise an exception if the locale isn't available. You could also automatically convert all text to UTF-8 using iconv. I don't know how much this would help but I would certainly commit a patch along these lines if anyone is up for it. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From kraemer at webit.de Wed Mar 14 04:59:26 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 14 Mar 2007 09:59:26 +0100 Subject: [Ferret-talk] Acts_as_ferret and auto-flush In-Reply-To: <623EA2CB-0687-4828-8FB9-6AC8BA35C76E@laika.nl> References: <623EA2CB-0687-4828-8FB9-6AC8BA35C76E@laika.nl> Message-ID: <20070314085926.GJ1352@cordoba.webit.de> On Tue, Mar 13, 2007 at 05:20:51PM +0100, jeroen janssen wrote: > Hi, > > I'm using acts_as_ferret in with a mongrel and I' m getting locking > errors that after a while result in a corrupt database. what version of ferret do you use? the latest ferret versions (0.11.x) should show a much better behaviour with shared index access. > I know about the problem with different processes writing to the > index but I haven't been able to get the DRB server working properly > yet. I read on this list that another solution is to set :auto_flush > to true but I'm not sure how to do this. As I understand it I have to > do this for the Index class and not for the model that > acts_as_ferret, right? How exactly do I do this? Do I just have to > make a new Index model? acts_as_ferret already uses auto_flush => true in normal use (but not in rebuild_index, because it would be really dumb to flush the index after every record while batch indexing all records). cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From neongrau at gmail.com Wed Mar 14 11:45:04 2007 From: neongrau at gmail.com (neongrau __) Date: Wed, 14 Mar 2007 16:45:04 +0100 Subject: [Ferret-talk] aaf batch_size limits indexing on mssql to 1000 records Message-ID: hi! after wondering why i can't find alot of records i eventually found the problem in the sqlserver_adapters implementation of "add_limit_offset!". the problem is when using MSSQL with the sqlserver_adapter paging will only work when at least one column is defined in ":order". for example i was indexing a table with 2912 records, the generated sql for the batches were these: SELECT * FROM (SELECT TOP 1000 * FROM (SELECT TOP 1000 * FROM table_name) AS tmp1 ) AS tmp2 SELECT * FROM (SELECT TOP 1000 * FROM (SELECT TOP 2000 * FROM table_name) AS tmp1 ) AS tmp2 SELECT * FROM (SELECT TOP 912 * FROM (SELECT TOP 2912 * FROM table_name) AS tmp1 ) AS tmp2 as you can imagine it was indexing the same top 1000 records 3 times :( i think a default ordering by the primary key would help to eliminate that problem. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Mar 14 12:07:12 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 14 Mar 2007 17:07:12 +0100 Subject: [Ferret-talk] Using act_as_ferret with find_by_sql In-Reply-To: References: Message-ID: <20070314160712.GB23223@cordoba.webit.de> Hi! On Mon, Mar 05, 2007 at 10:14:41AM +0100, Henrik Zagerholm wrote: > Hello, > > I wonder if its possible to combine ferret queries with find_by_sql > queries? > > Or should I try to rewrite my query using find and then use > find_by_content when I'm done? That's one option. The other is to use find_id_by_contents, collect the ids and use them in your find_by_sql statement. cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Wed Mar 14 12:14:17 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 14 Mar 2007 17:14:17 +0100 Subject: [Ferret-talk] aaf batch_size limits indexing on mssql to 1000 records In-Reply-To: References: Message-ID: <20070314161417.GC23223@cordoba.webit.de> On Wed, Mar 14, 2007 at 04:45:04PM +0100, neongrau __ wrote: > hi! > > after wondering why i can't find alot of records i eventually found the > problem in the sqlserver_adapters implementation of "add_limit_offset!". > > the problem is when using MSSQL with the sqlserver_adapter paging will > only work when at least one column is defined in ":order". > > for example i was indexing a table with 2912 records, the generated sql > for the batches were these: > > SELECT * FROM (SELECT TOP 1000 * FROM (SELECT TOP 1000 * FROM > table_name) AS tmp1 ) AS tmp2 > SELECT * FROM (SELECT TOP 1000 * FROM (SELECT TOP 2000 * FROM > table_name) AS tmp1 ) AS tmp2 > SELECT * FROM (SELECT TOP 912 * FROM (SELECT TOP 2912 * FROM table_name) > AS tmp1 ) AS tmp2 > > > as you can imagine it was indexing the same top 1000 records 3 times :( > > i think a default ordering by the primary key would help to eliminate > that problem. I just committed this, so could you please try if current trunk fixes this for you? Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From neongrau at gmail.com Wed Mar 14 12:43:35 2007 From: neongrau at gmail.com (neongrau __) Date: Wed, 14 Mar 2007 17:43:35 +0100 Subject: [Ferret-talk] aaf batch_size limits indexing on mssql to 1000 records In-Reply-To: <20070314161417.GC23223@cordoba.webit.de> References: <20070314161417.GC23223@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > I just committed this, so could you please try if current trunk fixes > this for you? hi! thanks for your fast response. yes it looks like everything gets indexed now. but now somehow find_id_by_contents doesnt work as before, has it changed? i was using this to get all id's as an array: .find_id_by_contents(search, {:limit => :all}).collect {|x| x[:id]} -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Mar 14 12:53:21 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 14 Mar 2007 17:53:21 +0100 Subject: [Ferret-talk] [AAF] remote indexing via DRb with acts_as_ferret In-Reply-To: <596199d7e57a4ff0249a31e8911e4fbc@ruby-forum.com> References: <20070204193455.GB29012@cordoba.webit.de> <9dfbb7b7672096f845c20c1b11370f57@ruby-forum.com> <20070304214947.GB28769@cordoba.webit.de> <46c98b69e8256bd86b075184ed9e286c@ruby-forum.com> <91221b6790a53d531ca8907ab97c0069@ruby-forum.com> <20070306094456.GB28036@cordoba.webit.de> <596199d7e57a4ff0249a31e8911e4fbc@ruby-forum.com> Message-ID: <20070314165321.GF23223@cordoba.webit.de> Hi! On Wed, Mar 07, 2007 at 06:31:37AM +0100, Sanjay Kapoor wrote: [..] > My search controller uses it like this: > > # build sort > @sf_published_at = Ferret::Search::SortField.new(:published_at_string, > :type => :string, :reverse => true) > @sort = Ferret::Search::Sort.new(@sf_published_at) > > # set options > @options = {:limit => 20, :sort => @sort} could you please try @options = { :limit => 20, :sort => [ @sf_published_at ] } instead? I have still some problems marshalling ferret classes, seems the Sort class is one of them... Using the array instead of a Sort instance works fine here (Ferret 0.11.2) Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Wed Mar 14 12:57:25 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 14 Mar 2007 17:57:25 +0100 Subject: [Ferret-talk] aaf batch_size limits indexing on mssql to 1000 records In-Reply-To: References: <20070314161417.GC23223@cordoba.webit.de> Message-ID: <20070314165725.GG23223@cordoba.webit.de> On Wed, Mar 14, 2007 at 05:43:35PM +0100, neongrau __ wrote: > Jens Kraemer wrote: > > I just committed this, so could you please try if current trunk fixes > > this for you? > > hi! > > thanks for your fast response. > yes it looks like everything gets indexed now. > > but now somehow find_id_by_contents doesnt work as before, has it > changed? > > i was using this to get all id's as an array: > .find_id_by_contents(search, {:limit => :all}).collect {|x| x[:id]} if used without a block, find_id_by_contents returns a 2-element-array where the first element is the number of total hits and the last element is the results array, so .find_id_by_contents(search, {:limit => :all}).last.collect {|x| x[:id]} should work. I'm not sure when this behaviour has been introduced, but I guess it's been a while ago... Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From neongrau at gmail.com Wed Mar 14 13:11:17 2007 From: neongrau at gmail.com (neongrau __) Date: Wed, 14 Mar 2007 18:11:17 +0100 Subject: [Ferret-talk] aaf batch_size limits indexing on mssql to 1000 records In-Reply-To: <20070314165725.GG23223@cordoba.webit.de> References: <20070314161417.GC23223@cordoba.webit.de> <20070314165725.GG23223@cordoba.webit.de> Message-ID: <061f05622f353592bdc0b75e7de0e0ef@ruby-forum.com> Jens Kraemer wrote: > if used without a block, find_id_by_contents returns a 2-element-array > where the first element is the number of total hits and the last element > is the results array, so ok, found that while checking on the console. but whats the difference to just ask for .length on the resulting array on the old behavior? -- Posted via http://www.ruby-forum.com/. From jeroen at laika.nl Wed Mar 14 14:06:09 2007 From: jeroen at laika.nl (jeroen janssen) Date: Wed, 14 Mar 2007 19:06:09 +0100 Subject: [Ferret-talk] Acts_as_ferret and auto-flush In-Reply-To: <20070314085926.GJ1352@cordoba.webit.de> References: <623EA2CB-0687-4828-8FB9-6AC8BA35C76E@laika.nl> <20070314085926.GJ1352@cordoba.webit.de> Message-ID: >> I'm using acts_as_ferret in with a mongrel and I' m getting locking >> errors that after a while result in a corrupt database. > > what version of ferret do you use? the latest ferret versions (0.11.x) > should show a much better behaviour with shared index access. Thanks.. that seems to work a little better. unfortunately I'm still having some problems. The best solution would probably be to get DRB server to work, but I haven't had much luck with that yet. As a temporary solution I was thinking of just not let the model index itself on create but do a scheduled rebuild every hour or something. Is there any way to have a acts_as_ferret model not update de index automatically? -- Jeroen From xbelanch at gmail.com Wed Mar 14 14:11:32 2007 From: xbelanch at gmail.com (Xavier Belanche) Date: Wed, 14 Mar 2007 19:11:32 +0100 Subject: [Ferret-talk] index.rb:384 [BUG] In-Reply-To: References: <9f7eac58b1f5966123b9aa1366264e0a@ruby-forum.com> <1123b0ee30e7d6424eec68c34386e441@ruby-forum.com> Message-ID: David Balmain wrote: > On 3/14/07, Xavier Belanche wrote: > >> adipiscing elit. Nullam tempor risus et ante. Maecenas consectetuer >> >> :path=>"script/../config/../config/../index/development/article", >> >> puts doc,score >> >> Thanks for all! > > Did you rebuild your index when you changed the title to a stored > field? I don't think this will happen automatically. Thanks! it's ok now :DDDDDDD Only make: >> Article.rebuild_index And that's run ok! :D Thanks again! -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Mar 15 04:23:57 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 15 Mar 2007 09:23:57 +0100 Subject: [Ferret-talk] Acts_as_ferret and auto-flush In-Reply-To: References: <623EA2CB-0687-4828-8FB9-6AC8BA35C76E@laika.nl> <20070314085926.GJ1352@cordoba.webit.de> Message-ID: <20070315082357.GH23223@cordoba.webit.de> On Wed, Mar 14, 2007 at 07:06:09PM +0100, jeroen janssen wrote: > >> I'm using acts_as_ferret in with a mongrel and I' m getting locking > >> errors that after a while result in a corrupt database. > > > > what version of ferret do you use? the latest ferret versions (0.11.x) > > should show a much better behaviour with shared index access. > > Thanks.. that seems to work a little better. unfortunately I'm still > having some problems. > > The best solution would probably be to get DRB server to work, but I > haven't had much luck with that yet. what were your problems? > As a temporary solution I was > thinking of just not let the model index itself on create but do a > scheduled rebuild every hour or something. Is there any way to have a > acts_as_ferret model not update de index automatically? yeah, override the ferret_enabled? instance method to return false so the automatic indexing is skipped. In aaf trunk this method has an optional boolean parameter that indicates if it is called from rebuild_index (true) or not (false, default). Before that it has not been called when the index was rebuilt. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Thu Mar 15 04:28:24 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 15 Mar 2007 09:28:24 +0100 Subject: [Ferret-talk] aaf batch_size limits indexing on mssql to 1000 records In-Reply-To: <061f05622f353592bdc0b75e7de0e0ef@ruby-forum.com> References: <20070314161417.GC23223@cordoba.webit.de> <20070314165725.GG23223@cordoba.webit.de> <061f05622f353592bdc0b75e7de0e0ef@ruby-forum.com> Message-ID: <20070315082824.GI23223@cordoba.webit.de> On Wed, Mar 14, 2007 at 06:11:17PM +0100, neongrau __ wrote: > Jens Kraemer wrote: > > if used without a block, find_id_by_contents returns a 2-element-array > > where the first element is the number of total hits and the last element > > is the results array, so > > ok, found that while checking on the console. > > but whats the difference to just ask for .length on the resulting array > on the old behavior? if you use :limit to only fetch 20 or so results, total_hits will still tell you the total number of results, while results.length will be 20. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From jeroen at laika.nl Thu Mar 15 04:44:04 2007 From: jeroen at laika.nl (jeroen janssen) Date: Thu, 15 Mar 2007 09:44:04 +0100 Subject: [Ferret-talk] Acts_as_ferret and auto-flush In-Reply-To: <20070315082357.GH23223@cordoba.webit.de> References: <623EA2CB-0687-4828-8FB9-6AC8BA35C76E@laika.nl> <20070314085926.GJ1352@cordoba.webit.de> <20070315082357.GH23223@cordoba.webit.de> Message-ID: <03C57B3A-C83A-453E-BCEF-4A8E24A28128@laika.nl> > On Wed, Mar 14, 2007 at 07:06:09PM +0100, jeroen janssen wrote: >>>> I'm using acts_as_ferret in with a mongrel and I' m getting locking >>>> errors that after a while result in a corrupt database. >>> >>> what version of ferret do you use? the latest ferret versions >>> (0.11.x) >>> should show a much better behaviour with shared index access. >> >> Thanks.. that seems to work a little better. unfortunately I'm still >> having some problems. >> >> The best solution would probably be to get DRB server to work, but I >> haven't had much luck with that yet. > > what were your problems? If the load on the server is not too high it works alright, but after a while I get exceptions after which the index gets corrupted, I think. I have included some of these errors at the end of this e-mail. My problem with the DRB server is simply that I can't get the script to run. when I run script/runner vendor/plugins/acts_as_ferret/script/ ferret_server I get a 'undefined local variable or method `vendor' for # (NameError)' This happens on my server and on my local machine. (Rails 1.2.1) If I copy the script to RAILS_ROOT/lib and do script/runner "require 'ferret_server'" as you suggested earlier I don't get an error but I also don't get any feedback that something is running. When I try to search I get a 'druby://localhost:9009 - #' error. >> As a temporary solution I was >> thinking of just not let the model index itself on create but do a >> scheduled rebuild every hour or something. Is there any way to have a >> acts_as_ferret model not update de index automatically? > > yeah, override the ferret_enabled? instance method to return false so > the automatic indexing is skipped. In aaf trunk this method has an > optional boolean parameter that indicates if it is called from > rebuild_index (true) or not (false, default). Before that it has not > been called when the index was rebuilt. Ok.. thanks, I will try if this helps for now... ---- Here are some of the errors I'm getting now: -- A IOError occurred in search#weblogs: IO Error occured at :93 in xraise Error occured in index.c:886 - sis_find_segments_file Error reading the segment infos. Store listing was /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/ index.rb:706:in `initialize' ---- A EOFError occurred in weblog#show_by_login: End-of-File Error occured at :117 in xpop_context Error occured in store.c:216 - is_refill current pos = 0, file length = 0 /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/ index.rb:285:in `delete' ---- A FileNotFoundError occurred in weblog#show_by_login: File Not Found Error occured at :93 in xraise Error occured in fs_store.c:329 - fs_open_input tried to open "/www/wnf.dma.nl/rails_app/config/../index/production/ user/_1ez_0.f5" but it doesn't exist: ---- A FileNotFoundError occurred in weblog#show_by_login: File Not Found Error occured at :117 in xpop_context Error occured in fs_store.c:329 - fs_open_input tried to open "/www/wnf.dma.nl/rails_app/config/../index/production/ user/_50t_w.del" but it doesn't exist: /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/ index.rb:285:in `delete' --- A FileNotFoundError occurred in account#logout: File Not Found Error occured at :117 in xpop_context Error occured in fs_store.c:329 - fs_open_input tried to open "/www/wnf.dma.nl/rails_app/config/../index/production/ user/_50t_w.del" but it doesn't exist: /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/ index.rb:285:in `delete' From kraemer at webit.de Thu Mar 15 06:11:07 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 15 Mar 2007 11:11:07 +0100 Subject: [Ferret-talk] Acts_as_ferret and auto-flush In-Reply-To: <03C57B3A-C83A-453E-BCEF-4A8E24A28128@laika.nl> References: <623EA2CB-0687-4828-8FB9-6AC8BA35C76E@laika.nl> <20070314085926.GJ1352@cordoba.webit.de> <20070315082357.GH23223@cordoba.webit.de> <03C57B3A-C83A-453E-BCEF-4A8E24A28128@laika.nl> Message-ID: <20070315101107.GA6202@cordoba.webit.de> On Thu, Mar 15, 2007 at 09:44:04AM +0100, jeroen janssen wrote: > > On Wed, Mar 14, 2007 at 07:06:09PM +0100, jeroen janssen wrote: [..] > > My problem with the DRB server is simply that I can't get the script > to run. when I run script/runner vendor/plugins/acts_as_ferret/script/ > ferret_server I get a 'undefined local variable or method `vendor' > for # (NameError)' > > This happens on my server and on my local machine. (Rails 1.2.1) > > If I copy the script to RAILS_ROOT/lib and do script/runner "require > 'ferret_server'" as you suggested earlier I don't get an error but I > also don't get any feedback that something is running. When I try to > search I get a 'druby://localhost:9009 - # Connection refused - connect(2)>' error. could you please try ./script/runner "load 'ferret_server'" instead? this seems to fix the problem. Hope I'll find the time to integrate the start/stop scripts that recently were posted to the list soon. [..] > > Here are some of the errors I'm getting now: did you rebuild your index after upgrading to 0.11.3? Besides that, I don't know what to do about these Errors - maybe Dave can comment on these? Jens. > > -- > > A IOError occurred in search#weblogs: > > IO Error occured at :93 in xraise > Error occured in index.c:886 - sis_find_segments_file > Error reading the segment infos. Store listing was > > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/ > index.rb:706:in `initialize' > > ---- > > A EOFError occurred in weblog#show_by_login: > > End-of-File Error occured at :117 in xpop_context > Error occured in store.c:216 - is_refill > current pos = 0, file length = 0 > > > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/ > index.rb:285:in `delete' > > ---- > > A FileNotFoundError occurred in weblog#show_by_login: > > File Not Found Error occured at :93 in xraise > Error occured in fs_store.c:329 - fs_open_input > tried to open "/www/wnf.dma.nl/rails_app/config/../index/production/ > user/_1ez_0.f5" but it doesn't exist: > > ---- > > A FileNotFoundError occurred in weblog#show_by_login: > > File Not Found Error occured at :117 in xpop_context > Error occured in fs_store.c:329 - fs_open_input > tried to open "/www/wnf.dma.nl/rails_app/config/../index/production/ > user/_50t_w.del" but it doesn't exist: > > > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/ > index.rb:285:in `delete' > > --- > > A FileNotFoundError occurred in account#logout: > > File Not Found Error occured at :117 in xpop_context > Error occured in fs_store.c:329 - fs_open_input > tried to open "/www/wnf.dma.nl/rails_app/config/../index/production/ > user/_50t_w.del" but it doesn't exist: > > > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/ > index.rb:285:in `delete' > > > -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From jeroen at laika.nl Thu Mar 15 06:43:51 2007 From: jeroen at laika.nl (jeroen janssen) Date: Thu, 15 Mar 2007 11:43:51 +0100 Subject: [Ferret-talk] Acts_as_ferret and auto-flush In-Reply-To: <20070315101107.GA6202@cordoba.webit.de> References: <623EA2CB-0687-4828-8FB9-6AC8BA35C76E@laika.nl> <20070314085926.GJ1352@cordoba.webit.de> <20070315082357.GH23223@cordoba.webit.de> <03C57B3A-C83A-453E-BCEF-4A8E24A28128@laika.nl> <20070315101107.GA6202@cordoba.webit.de> Message-ID: > could you please try > > ./script/runner "load 'ferret_server'" > > instead? this seems to fix the problem. Hope I'll find the time to > integrate the start/stop scripts that recently were posted to the list > soon. Hey, thanks! That seems to do the trick, on my local server at least. I'm going to try it out soon on the production server. > did you rebuild your index after upgrading to 0.11.3? Yeah, I did. > Besides that, I don't know what to do about these Errors - maybe > Dave can comment on > these? No problem, I will try if it works better with the DRB server. Thanks a lot for your help and for the excellent work on acts_as_ferret. From caleb at inforadical.net Fri Mar 16 05:21:23 2007 From: caleb at inforadical.net (Caleb Clausen) Date: Fri, 16 Mar 2007 02:21:23 -0700 Subject: [Ferret-talk] ferret on 64bit systems? Message-ID: <45FA6193.7040102@inforadical.net> I'm still having some crashes on my server that don't seem to happen on my development system. One difference between them is that the server is running in 64bit mode. Are there any issues running ferret on a 64bit system? I've seen some old traffic on the subject but all from about 9 months ago. There are some warnings printed out when I install ferret on the server that I don't recall seeing on my workstation. None of them seemed terribly noxious, but I haven't really looked into what code causes these. Anyway, here are the warnings for 0.11.3: Building native extensions. This could take a while? r_search.c: In function ?frt_td_to_s?: r_search.c:202: warning: format ?%d? expects type ?int?, but argument 3 has type ?long int? ferret.c: In function ?object_add2?: ferret.c:69: warning: cast from pointer to integer of different size ferret.c:69: warning: cast from pointer to integer of different size ferret.c: In function ?object_del2?: ferret.c:88: warning: cast from pointer to integer of different size compound_io.c: In function ?cmpdi_read_i?: compound_io.c:135: warning: format ?%lld? expects type ?long long int?, but argument 4 has type ?off_t? The last warning was repeated many times; I didn't save them all in the log, sorry. From starburger234 at yahoo.de Sun Mar 18 16:11:21 2007 From: starburger234 at yahoo.de (Star Burger) Date: Sun, 18 Mar 2007 21:11:21 +0100 Subject: [Ferret-talk] =?utf-8?b?IsO2IiBjYXVzZXMgZmluZF9ieV9jb250ZW50cyBu?= =?utf-8?q?ot_to_return?= Message-ID: <82b98055afee3bf1922170448d197ed5@ruby-forum.com> I've installed ferret 0.10.9 together with the latest acts_as_ferret using Windows XP and indexed a location database (geonames.org) with Location.rebuild_index. The data is in utf-8. Now calling Location.find_by_contents "?" does not return a result, causes a lot of CPU load, and finally exits with an error "index.rb:702: in 'parse': failed to allocate memory (NoMemoryError)". Seems a problem in 'process_query'. Similar results for sometimes for other German Umlauts... -- Posted via http://www.ruby-forum.com/. From starburger234 at yahoo.de Mon Mar 19 01:46:00 2007 From: starburger234 at yahoo.de (Star Burger) Date: Mon, 19 Mar 2007 06:46:00 +0100 Subject: [Ferret-talk] Many index files Message-ID: <1dd92dc8821545b7a24dadfdf13fe5d3@ruby-forum.com> I'm using acts_as_ferret and have indexed a model with acts_as_ferret :fields => [:name, :ascii_name, :alt_names], :single_index => true. Now in the index directory more than 95.000 files are generated! The number of tuples I'm indexing is approx. 86.000. I can't remember this from earlier ferret/acts_as_ferret versions where I've indexed millions of tuples without having such a number of files. Is there a way of reducing the number of index files? What are the consequences? Thanks. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Mar 19 10:19:57 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 19 Mar 2007 15:19:57 +0100 Subject: [Ferret-talk] Many index files In-Reply-To: <1dd92dc8821545b7a24dadfdf13fe5d3@ruby-forum.com> References: <1dd92dc8821545b7a24dadfdf13fe5d3@ruby-forum.com> Message-ID: <20070319141957.GH6202@cordoba.webit.de> On Mon, Mar 19, 2007 at 06:46:00AM +0100, Star Burger wrote: > I'm using acts_as_ferret and have indexed a model with acts_as_ferret > :fields => [:name, :ascii_name, :alt_names], :single_index => true. > > Now in the index directory more than 95.000 files are generated! The > number of tuples I'm indexing is approx. 86.000. That doesn't sound ok. Is the index useable? And did you do a rebuild that resultet in this index or was it normal application usage? > I can't remember this from earlier ferret/acts_as_ferret versions where > I've indexed millions of tuples without having such a number of files. > > Is there a way of reducing the number of index files? What are the > consequences? Try to optimize the index - either directly with Ferret i = Ferret::I.new(:path => 'path/to/index') i.optimize or via aaf: Model.aaf_index.ferret_index.optimize > > Thanks. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From starburger234 at yahoo.de Mon Mar 19 11:25:08 2007 From: starburger234 at yahoo.de (Star Burger) Date: Mon, 19 Mar 2007 16:25:08 +0100 Subject: [Ferret-talk] Many index files In-Reply-To: <20070319141957.GH6202@cordoba.webit.de> References: <1dd92dc8821545b7a24dadfdf13fe5d3@ruby-forum.com> <20070319141957.GH6202@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > On Mon, Mar 19, 2007 at 06:46:00AM +0100, Star Burger wrote: >> I'm using acts_as_ferret and have indexed a model with acts_as_ferret >> :fields => [:name, :ascii_name, :alt_names], :single_index => true. >> >> Now in the index directory more than 95.000 files are generated! The >> number of tuples I'm indexing is approx. 86.000. > > That doesn't sound ok. Is the index useable? And did you do a rebuild > that resultet in this index or was it normal application usage? > >> I can't remember this from earlier ferret/acts_as_ferret versions where >> I've indexed millions of tuples without having such a number of files. >> >> Is there a way of reducing the number of index files? What are the >> consequences? > > Try to optimize the index - either directly with Ferret > i = Ferret::I.new(:path => 'path/to/index') > i.optimize > > or via aaf: > > Model.aaf_index.ferret_index.optimize > >> >> Thanks. >> The index is usable (although doesn't seem to be the fastest) and is a direct result of Model.rebuild_index. The index wasn't built up step by step from application usage, but with a singel rebuild_index from a filled DB. starburger -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Mar 19 12:29:27 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 19 Mar 2007 17:29:27 +0100 Subject: [Ferret-talk] Many index files In-Reply-To: References: <1dd92dc8821545b7a24dadfdf13fe5d3@ruby-forum.com> <20070319141957.GH6202@cordoba.webit.de> Message-ID: <20070319162927.GA9039@cordoba.webit.de> On Mon, Mar 19, 2007 at 04:25:08PM +0100, Star Burger wrote: [..] > > The index is usable (although doesn't seem to be the fastest) and is a > direct result of Model.rebuild_index. The index wasn't built up step by > step from application usage, but with a singel rebuild_index from a > filled DB. I guess optimizing the index didn't solve the problem? Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From spahl at rift.fr Mon Mar 19 14:27:07 2007 From: spahl at rift.fr (Sebastien Pahl) Date: Mon, 19 Mar 2007 19:27:07 +0100 Subject: [Ferret-talk] Concurrency Problem in 0.11.3 Message-ID: <70b2335f0e8990eb4a8f42ee7d1dcf72@ruby-forum.com> Hi, I'm having some strange/random crashes with ferret when using different programs on the same index. I created a script to reproduce the errors: http://www.sig11.org/~seb/ferret_crash.rb Usage: In one terminal run: ruby ferret_crash.rb first In another terminal: ruby ferret_crash.rb Errors I usually get are but it is really random: /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:542:in `close': End-of-File Error occured at :93 in xraise (EOFError) Error occured in store.c:216 - is_refill current pos = 0, file length = 0 from /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:542:in `flush' from /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' from /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:539:in `flush' from ferret_crash.rb:25:in `create' from ferret_crash.rb:58 ============================================================================== E/usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:298: [BUG] Segmentation fault ruby 1.8.4 (2005-12-24) [i486-linux] ============================================================================== /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:542:in `close': IO Error occured at :93 in xraise (IOError) Error occured in fs_store.c:221 - fs_length getting lenth of /tmp/libferrettest/_v4.fdx: from /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:542:in `flush' from /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' from /usr/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:539:in `flush' from ferret_crash.rb:25:in `create' from ferret_crash.rb:58 I tried using the last trunk version but i didn't help. Thanks. Seb -- Sebastien Pahl - Rift Technologies spahl at rift.fr -- Posted via http://www.ruby-forum.com/. From john at johnleach.co.uk Mon Mar 19 15:08:03 2007 From: john at johnleach.co.uk (John Leach) Date: Mon, 19 Mar 2007 19:08:03 +0000 Subject: [Ferret-talk] Concurrency Problem in 0.11.3 In-Reply-To: <70b2335f0e8990eb4a8f42ee7d1dcf72@ruby-forum.com> References: <70b2335f0e8990eb4a8f42ee7d1dcf72@ruby-forum.com> Message-ID: <1174331283.4087.5.camel@localhost.localdomain> Hi Sebastien, I've had the same segfault problems and I'm told these are now fixed in subversion and will be in the next release. As for your other problems, Ferret does not support multiple processes writing to the same index. It's recommended that you use a DRb service to achieve this instead. John. On Mon, 2007-03-19 at 19:27 +0100, Sebastien Pahl wrote: > Hi, > > I'm having some strange/random crashes with ferret when using > different > programs on the same index. -- http://johnleach.co.uk From spahl at rift.fr Mon Mar 19 17:51:05 2007 From: spahl at rift.fr (Sebastien Pahl) Date: Mon, 19 Mar 2007 22:51:05 +0100 Subject: [Ferret-talk] Concurrency Problem in 0.11.3 In-Reply-To: <1174331283.4087.5.camel@localhost.localdomain> References: <70b2335f0e8990eb4a8f42ee7d1dcf72@ruby-forum.com> <1174331283.4087.5.camel@localhost.localdomain> Message-ID: Thanks for the quick answer. I tried it with the last svn release, the segfaults are gone. I'll have look at DRb. John Leach wrote: > I've had the same segfault problems and I'm told these are now fixed in > subversion and will be in the next release. > > As for your other problems, Ferret does not support multiple processes > writing to the same index. It's recommended that you use a DRb service > to achieve this instead. > -- Sebastien Pahl - Rift Technologies spahl at rift.fr -- Posted via http://www.ruby-forum.com/. From Neville.Burnell at bmsoft.com.au Mon Mar 19 18:20:05 2007 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Tue, 20 Mar 2007 09:20:05 +1100 Subject: [Ferret-talk] Many index files References: <1dd92dc8821545b7a24dadfdf13fe5d3@ruby-forum.com><20070319141957.GH6202@cordoba.webit.de> <20070319162927.GA9039@cordoba.webit.de> Message-ID: <126EC586577FD611A28E00A0C9A03758B5C6D2@maui.bmsoft.com.au> I have seen this issue with Ferret also. Somehow, a working index had nearly 250,000 files, requiring 2.5GB. Rebuilding the index resulted in the count dropping to 900 files requiring only 700MB. > -----Original Message----- > From: ferret-talk-bounces at rubyforge.org [mailto:ferret-talk- > bounces at rubyforge.org] On Behalf Of Jens Kraemer > Sent: Tuesday, 20 March 2007 3:29 AM > To: ferret-talk at rubyforge.org > Subject: Re: [Ferret-talk] Many index files > > On Mon, Mar 19, 2007 at 04:25:08PM +0100, Star Burger wrote: > [..] > > > > The index is usable (although doesn't seem to be the fastest) and is > a > > direct result of Model.rebuild_index. The index wasn't built up step > by > > step from application usage, but with a singel rebuild_index from a > > filled DB. > > I guess optimizing the index didn't solve the problem? > > Jens > > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen Malessa > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From starburger234 at yahoo.de Mon Mar 19 19:59:23 2007 From: starburger234 at yahoo.de (Star Burger) Date: Tue, 20 Mar 2007 00:59:23 +0100 Subject: [Ferret-talk] Many index files In-Reply-To: <20070319162927.GA9039@cordoba.webit.de> References: <1dd92dc8821545b7a24dadfdf13fe5d3@ruby-forum.com> <20070319141957.GH6202@cordoba.webit.de> <20070319162927.GA9039@cordoba.webit.de> Message-ID: <880104bdf3f885336eafe06430e8f713@ruby-forum.com> Jens Kraemer wrote: > On Mon, Mar 19, 2007 at 04:25:08PM +0100, Star Burger wrote: > [..] >> >> The index is usable (although doesn't seem to be the fastest) and is a >> direct result of Model.rebuild_index. The index wasn't built up step by >> step from application usage, but with a singel rebuild_index from a >> filled DB. > > I guess optimizing the index didn't solve the problem? > > Jens > > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen Malessa Location.aaf_index.ferret_index.optimize resluted in "=> nil" and took a fraction of a second only. The index structure didn't change. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Mar 19 20:56:40 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 20 Mar 2007 11:56:40 +1100 Subject: [Ferret-talk] ferret on 64bit systems? In-Reply-To: <45FA6193.7040102@inforadical.net> References: <45FA6193.7040102@inforadical.net> Message-ID: Hi Caleb, I'll try and work on this warnings and put out a new release. Give me a day or two. It is quite probable that these warnings are related to the issues you are having. Thanks for posting them. Cheers, Dave On 3/16/07, Caleb Clausen wrote: > I'm still having some crashes on my server that don't seem to happen on > my development system. One difference between them is that the server is > running in 64bit mode. Are there any issues running ferret on a 64bit > system? I've seen some old traffic on the subject but all from about 9 > months ago. > > There are some warnings printed out when I install ferret on the server > that I don't recall seeing on my workstation. None of them seemed > terribly noxious, but I haven't really looked into what code causes > these. Anyway, here are the warnings for 0.11.3: > > Building native extensions. This could take a while? > r_search.c: In function 'frt_td_to_s': > r_search.c:202: warning: format '%d' expects type 'int', but argument 3 > has type 'long int' > ferret.c: In function 'object_add2': > ferret.c:69: warning: cast from pointer to integer of different size > ferret.c:69: warning: cast from pointer to integer of different size > ferret.c: In function 'object_del2': > ferret.c:88: warning: cast from pointer to integer of different size > compound_io.c: In function 'cmpdi_read_i': > compound_io.c:135: warning: format '%lld' expects type 'long long int', > but argument 4 has type 'off_t' > > > The last warning was repeated many times; I didn't save them all in the > log, sorry. > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Dave Balmain http://www.davebalmain.com/ From starburger234 at yahoo.de Mon Mar 19 21:17:41 2007 From: starburger234 at yahoo.de (Star Burger) Date: Tue, 20 Mar 2007 02:17:41 +0100 Subject: [Ferret-talk] Many index files In-Reply-To: <880104bdf3f885336eafe06430e8f713@ruby-forum.com> References: <1dd92dc8821545b7a24dadfdf13fe5d3@ruby-forum.com> <20070319141957.GH6202@cordoba.webit.de> <20070319162927.GA9039@cordoba.webit.de> <880104bdf3f885336eafe06430e8f713@ruby-forum.com> Message-ID: <5df722240663a599bb72cd2e9fbd5032@ruby-forum.com> Star Burger wrote: > Jens Kraemer wrote: >> On Mon, Mar 19, 2007 at 04:25:08PM +0100, Star Burger wrote: >> [..] >>> >>> The index is usable (although doesn't seem to be the fastest) and is a >>> direct result of Model.rebuild_index. The index wasn't built up step by >>> step from application usage, but with a singel rebuild_index from a >>> filled DB. >> >> I guess optimizing the index didn't solve the problem? >> >> Jens >> >> >> -- >> Jens Kr?mer >> webit! Gesellschaft f?r neue Medien mbH >> Schnorrstra?e 76 | 01069 Dresden >> Telefon +49 351 46766-0 | Telefax +49 351 46766-66 >> kraemer at webit.de | www.webit.de >> >> Amtsgericht Dresden | HRB 15422 >> GF Sven Haubold, Hagen Malessa > > Location.aaf_index.ferret_index.optimize > > resluted in "=> nil" and took a fraction of a second only. The index > structure didn't change. Again - the number of files were the result of one indexing process: Model.rebuild_index. Most files (e.g. _j2.cfs) are listed as 1kb only. I've read in earlier posts that these might be temporary files that ferret couldn't delete for some reason. Is there a way to fix that? -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Mar 19 22:43:46 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 20 Mar 2007 13:43:46 +1100 Subject: [Ferret-talk] =?iso-8859-1?q?=22=F6=22_causes_find=5Fby=5Fcontent?= =?iso-8859-1?q?s_not_to_return?= In-Reply-To: <82b98055afee3bf1922170448d197ed5@ruby-forum.com> References: <82b98055afee3bf1922170448d197ed5@ruby-forum.com> Message-ID: On 3/19/07, Star Burger wrote: > I've installed ferret 0.10.9 together with the latest acts_as_ferret > using Windows XP and indexed a location database (geonames.org) with > Location.rebuild_index. The data is in utf-8. > > Now calling Location.find_by_contents "?" does not return a result, > causes a lot of CPU load, and finally exits with an error "index.rb:702: > in 'parse': failed to allocate memory (NoMemoryError)". Seems a problem > in 'process_query'. > > Similar results for sometimes for other German Umlauts... Unfortunately Ferret doesn't come with UTF-8 support in Windows as the win32 runtime environment doesn't seem to support UTF-8. You will therefore need to write your own analyzer on Windows if you want to support UTF-8 searches. Hopefully the NoMemoryError will be fixed in the next win32 gem I release. -- Dave Balmain http://www.davebalmain.com/ From dbalmain.ml at gmail.com Mon Mar 19 22:50:45 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 20 Mar 2007 13:50:45 +1100 Subject: [Ferret-talk] Many index files In-Reply-To: <5df722240663a599bb72cd2e9fbd5032@ruby-forum.com> References: <1dd92dc8821545b7a24dadfdf13fe5d3@ruby-forum.com> <20070319141957.GH6202@cordoba.webit.de> <20070319162927.GA9039@cordoba.webit.de> <880104bdf3f885336eafe06430e8f713@ruby-forum.com> <5df722240663a599bb72cd2e9fbd5032@ruby-forum.com> Message-ID: On 3/20/07, Star Burger wrote: > Star Burger wrote: > > Jens Kraemer wrote: > >> On Mon, Mar 19, 2007 at 04:25:08PM +0100, Star Burger wrote: > >> [..] > >>> > >>> The index is usable (although doesn't seem to be the fastest) and is a > >>> direct result of Model.rebuild_index. The index wasn't built up step by > >>> step from application usage, but with a singel rebuild_index from a > >>> filled DB. > >> > >> I guess optimizing the index didn't solve the problem? > >> > >> Jens > >> > >> > >> -- > >> Jens Kr?mer > >> webit! Gesellschaft f?r neue Medien mbH > >> Schnorrstra?e 76 | 01069 Dresden > >> Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > >> kraemer at webit.de | www.webit.de > >> > >> Amtsgericht Dresden | HRB 15422 > >> GF Sven Haubold, Hagen Malessa > > > > Location.aaf_index.ferret_index.optimize > > > > resluted in "=> nil" and took a fraction of a second only. The index > > structure didn't change. > > Again - the number of files were the result of one indexing process: > Model.rebuild_index. > > Most files (e.g. _j2.cfs) are listed as 1kb only. > > I've read in earlier posts that these might be temporary files that > ferret couldn't delete for some reason. Is there a way to fix that? Could you send me a full listing of the directory privately, as well as a copy of the segments_* file. That would be a big help in debugging this problem. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From thomas.senf at web.de Tue Mar 20 12:10:29 2007 From: thomas.senf at web.de (Thomas Senf) Date: Tue, 20 Mar 2007 17:10:29 +0100 Subject: [Ferret-talk] Strange Results For Term Frequencies Message-ID: <2339ca81173712e6a32e4705d0c70478@ruby-forum.com> I would like to thank all the people who have contributed to this very fine project. Great work! I've encountered some strange results while examining the term frequency of one of my indexed documents. The indexed terms seem to vary for the very same document depending on the presence or absence of completely unrelated operations in the code, so the resulting term frequency changes, too. I repeatedly call 'index_reader.term_docs_for' for the only document I've indexed in the snippet below, but depending on the presence of the statement 'dummy_count = 0' or some formatting code for the output the resulting term frequencies change from correct answers to wrong ones. Sometimes terms are not found at all. For better examination I add a complete snippet which produce this behavior on my system (the text is taken from http://de.wikipedia.org/wiki/Entgelt). I'm working with ferret Version 0.11.3, C extensions compiled with VC6.0 (but the 0.10.9-mswin32 binaries from the ferret gem show the same behavior), and ruby version 1.8.5. Has anybody an explanation for that or do I misuse something? require 'rubygems' require 'ferret' $KCODE='u' text = < StemAnalyzer.new()) @index << {:title => "Entgelt", :content => text} #dummy_count = 0 index_reader = @index.reader tde=index_reader.term_docs_for(:content, "Vertrag") tde.each{|did,freq| puts "Term \'Vertrag\' occurs in Document \'#{@index[did][:title]}\' #{freq} times (5 expected)\n"} tde=index_reader.term_docs_for(:content, "BGB") tde.each{|did,freq| puts "Term \'BGB\' occurs in Document \'#{@index[did][:title]}\' #{freq} times (3 expected)\n"} tde=index_reader.term_docs_for(:content, "Leistung") tde.each{|did,freq| puts "Term \'Leistung\' occurs in Document \'#{@index[did][:title]}\' #{freq} times (12 expected)\n"} Output: => Using Ferret v0.11.3... => Using Ruby v1.8.5... => Term 'Vertrag' occurs in Document 'Entgelt' 4 times (5 expected) => Term 'Leistung' occurs in Document 'Entgelt' 3 times (12 expected) Ouput after removing the comment in 'dummy_count=0': => Using Ferret v0.11.3... => Using Ruby v1.8.5... => Term 'Vertrag' occurs in Document 'Entgelt' 5 times (5 expected) => Term 'BGB' occurs in Document 'Entgelt' 3 times (3 expected) => Term 'Leistung' occurs in Document 'Entgelt' 12 times (12 expected) -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Tue Mar 20 19:10:59 2007 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 21 Mar 2007 10:10:59 +1100 Subject: [Ferret-talk] Strange Results For Term Frequencies In-Reply-To: <2339ca81173712e6a32e4705d0c70478@ruby-forum.com> References: <2339ca81173712e6a32e4705d0c70478@ruby-forum.com> Message-ID: On 3/21/07, Thomas Senf wrote: > I would like to thank all the people who have contributed to this very > fine project. Great work! > > I've encountered some strange results while examining the term frequency > of one of my indexed documents. The indexed terms seem to vary for the > very same document depending on the presence or absence of completely > unrelated operations in the code, so the resulting term frequency > changes, too. > > I repeatedly call 'index_reader.term_docs_for' for the only document > I've indexed in the snippet below, but depending on the presence of the > statement > 'dummy_count = 0' or some formatting code for the output the resulting > term frequencies change from correct answers to wrong ones. Sometimes > terms are not > found at all. > > For better examination I add a complete snippet which produce this > behavior on my system (the text is taken from > http://de.wikipedia.org/wiki/Entgelt). I'm > working with ferret Version 0.11.3, C extensions compiled with VC6.0 > (but the 0.10.9-mswin32 binaries from the ferret gem show the same > behavior), and ruby > version 1.8.5. > > Has anybody an explanation for that or do I misuse something? > Test Code Hi Thomas, Firstly, well done compiling Ferret on Windows and thanks for posting this. The reason I haven't yet released a win32 gem is that I'm still trying to work out the String#dump issue which is wreaking havoc when people try and use Ferret with Rails on Windows. I suspect this issue of yours is somehow related. I'll let you know as soon as I find a solution. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/ From pritchie at videotron.ca Tue Mar 20 19:56:49 2007 From: pritchie at videotron.ca (Patrick Ritchie) Date: Tue, 20 Mar 2007 19:56:49 -0400 Subject: [Ferret-talk] Strange Results For Term Frequencies In-Reply-To: References: <2339ca81173712e6a32e4705d0c70478@ruby-forum.com> Message-ID: <460074C1.9020206@videotron.ca> David Balmain wrote: > On 3/21/07, Thomas Senf wrote: > >> I would like to thank all the people who have contributed to this very >> fine project. Great work! >> >> I've encountered some strange results while examining the term frequency >> of one of my indexed documents. The indexed terms seem to vary for the >> very same document depending on the presence or absence of completely >> unrelated operations in the code, so the resulting term frequency >> changes, too. >> >> I repeatedly call 'index_reader.term_docs_for' for the only document >> I've indexed in the snippet below, but depending on the presence of the >> statement >> 'dummy_count = 0' or some formatting code for the output the resulting >> term frequencies change from correct answers to wrong ones. Sometimes >> terms are not >> found at all. >> >> For better examination I add a complete snippet which produce this >> behavior on my system (the text is taken from >> http://de.wikipedia.org/wiki/Entgelt). I'm >> working with ferret Version 0.11.3, C extensions compiled with VC6.0 >> (but the 0.10.9-mswin32 binaries from the ferret gem show the same >> behavior), and ruby >> version 1.8.5. >> >> Has anybody an explanation for that or do I misuse something? >> Test Code >> I ran the test code on both the 0.10.9 win32 gem and on Cygwin on 0.11.3 Here are the results: # dummy_count = 0 Using Ferret v0.10.9... Using Ruby v1.8.5... Term 'Vertrag' occurs in Document 'Entgelt' 4 times (5 expected) Term 'BGB' occurs in Document 'Entgelt' 1 times (3 expected) Term 'Leistung' occurs in Document 'Entgelt' 5 times (12 expected) Using Ferret v0.11.3... Using Ruby v1.8.5... Term 'Vertrag' occurs in Document 'Entgelt' 5 times (5 expected) Term 'BGB' occurs in Document 'Entgelt' 9 times (3 expected) Term 'Leistung' occurs in Document 'Entgelt' 12 times (12 expected) dummy_count = 0 C:\Documents and Settings\Patrick Ritchie\ruby>ruby tf_test.rb Using Ferret v0.10.9... Using Ruby v1.8.5... Term 'Vertrag' occurs in Document 'Entgelt' 4 times (5 expected) Term 'BGB' occurs in Document 'Entgelt' 1 times (3 expected) Term 'Leistung' occurs in Document 'Entgelt' 5 times (12 expected) Using Ferret v0.11.3... Using Ruby v1.8.5... Term 'Vertrag' occurs in Document 'Entgelt' 5 times (5 expected) Term 'BGB' occurs in Document 'Entgelt' 9 times (3 expected) Term 'Leistung' occurs in Document 'Entgelt' 12 times (12 expected) Results don't seem to change when dummy_count is set, I think the difference between Cygwin and the straight win32 build is the UTF-8 support. Cheers! Patrick From lajanus at o2.pl Wed Mar 21 08:32:37 2007 From: lajanus at o2.pl (Linus) Date: Wed, 21 Mar 2007 13:32:37 +0100 Subject: [Ferret-talk] store_class_name for Comatose:Page model Message-ID: Hi, I have three models: Comatose::Page, Article and Product. In all of them, store_class_name is set to true. Now, when i do: results = Comatose::Page.multi_search("*", [Article,Product], options) I get: wrong constant name Comatose::Page #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:438:in `const_get' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:438:in `multi_search' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:460:in `id_multi_search' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/multi_index.rb:28:in `search_each' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/multi_index.rb:28:in `search_each' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:457:in `id_multi_search' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:437:in `multi_search' #{RAILS_ROOT}/app/controllers/application.rb:44:in `run_search' #{RAILS_ROOT}/app/controllers/application.rb:68:in `full_text_search' #{RAILS_ROOT}/app/controllers/search_controller.rb:16:in `index' -e:4:in `load' -e:4 The ferret index files include the name Comatose::Page and are put in index/developement/comatose/page directory. Any idea why I might get that error? I dont get it with results = Comatose::Page.find_by_contents(q, options) Best regards, Tom -- Posted via http://www.ruby-forum.com/. From thomas.senf at web.de Wed Mar 21 09:18:09 2007 From: thomas.senf at web.de (Thomas Senf) Date: Wed, 21 Mar 2007 14:18:09 +0100 Subject: [Ferret-talk] =?utf-8?b?IsO2IiBjYXVzZXMgZmluZF9ieV9jb250ZW50IHMg?= =?utf-8?q?not_to_return?= In-Reply-To: References: <82b98055afee3bf1922170448d197ed5@ruby-forum.com> Message-ID: <45f1e91f75cbe157215e77ca22cbe4aa@ruby-forum.com> David Balmain wrote: > > Unfortunately Ferret doesn't come with UTF-8 support in Windows as the > win32 runtime environment doesn't seem to support UTF-8. You will > therefore need to write your own analyzer on Windows if you want to > support UTF-8 searches. > Hello Star Burger, if you're planning to write your own UTF-8 Analyzer consider the unpack/pack duo: utf-8_encoded_string_from_db.unpack("U*").pack("C*") @index << {:content => utf-8_encoded_string_from_db} @index.search_each('content:Beh?rde') {|id,score| do_sth} I didn't try this in afa, but with ruby it worked in my case. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Mar 21 10:16:33 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 21 Mar 2007 15:16:33 +0100 Subject: [Ferret-talk] store_class_name for Comatose:Page model In-Reply-To: References: Message-ID: <20070321141633.GG9602@cordoba.webit.de> On Wed, Mar 21, 2007 at 01:32:37PM +0100, Linus wrote: > Hi, > I have three models: Comatose::Page, Article and Product. > In all of them, store_class_name is set to true. > > Now, when i do: > results = Comatose::Page.multi_search("*", [Article,Product], options) > I get: > > wrong constant name Comatose::Page > > #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:438:in > `const_get' [..] > Any idea why I might get that error? yeah, const_get can't resolve namespaced constant names - this issue is fixed in trunk. If you don't want to upgrade, replace the call to const_get with something like model_class = model_name_string.constantize (dont know what's the name of the var holding the model's name from the top of my head) [..] > I dont get it with results = Comatose::Page.find_by_contents(q, options) that's because in this case (one-model query) there is no need to lookup a class based on the model name stored in the index. cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From lajanus at o2.pl Wed Mar 21 11:20:09 2007 From: lajanus at o2.pl (Linus) Date: Wed, 21 Mar 2007 16:20:09 +0100 Subject: [Ferret-talk] store_class_name for Comatose:Page model In-Reply-To: <20070321141633.GG9602@cordoba.webit.de> References: <20070321141633.GG9602@cordoba.webit.de> Message-ID: <7ab3b9678ccd556b60ca51958b1a2d30@ruby-forum.com> Jens Kraemer wrote: > On Wed, Mar 21, 2007 at 01:32:37PM +0100, Linus wrote: >> #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:438:in >> `const_get' > [..] >> Any idea why I might get that error? > > yeah, const_get can't resolve namespaced constant names - this issue is > fixed in trunk. Thank you, trunk works ok :-) Best regards, Tom -- Posted via http://www.ruby-forum.com/. From lajanus at o2.pl Wed Mar 21 12:57:46 2007 From: lajanus at o2.pl (Linus) Date: Wed, 21 Mar 2007 17:57:46 +0100 Subject: [Ferret-talk] Score more if begins with query Message-ID: <2a2907233d22e235e1b13d5b28994e03@ruby-forum.com> Hi, I need to score more on products, those names begin with query, rather then just contain it. I am not sure where to start research on that... Any ideas? Best regards, Tom -- Posted via http://www.ruby-forum.com/. From henke at mac.se Wed Mar 21 17:56:42 2007 From: henke at mac.se (Henrik Zagerholm) Date: Wed, 21 Mar 2007 22:56:42 +0100 Subject: [Ferret-talk] Cannot delete for id of type Array Message-ID: <69B536CE-48D6-4AD4-AD9D-9C6C85BD5106@mac.se> Hello list, I have a little weird error when deleting documents from the index. I'm using the following code. ferret_index = Ferret::Index::Index.new(:path => FERRET_INDEX_PATH) query = Ferret::Search::TermQuery.new(:fk_file_id, "#{_fk_file_id}") ferret_index.search_each(query) do | id | ferret_index.delete(id) end And I get the following error Cannot delete for id of type Array As I see it the only way this could happened is if search_each returns an Array of ID's but it couldn't right? I'm using version 0.11.3. Regards, Henrik From kraemer at webit.de Wed Mar 21 18:17:19 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 21 Mar 2007 23:17:19 +0100 Subject: [Ferret-talk] Cannot delete for id of type Array In-Reply-To: <69B536CE-48D6-4AD4-AD9D-9C6C85BD5106@mac.se> References: <69B536CE-48D6-4AD4-AD9D-9C6C85BD5106@mac.se> Message-ID: <20070321221719.GA8027@cordoba.webit.de> On Wed, Mar 21, 2007 at 10:56:42PM +0100, Henrik Zagerholm wrote: > Hello list, > > I have a little weird error when deleting documents from the index. > > I'm using the following code. > > ferret_index = Ferret::Index::Index.new(:path => FERRET_INDEX_PATH) > query = Ferret::Search::TermQuery.new(:fk_file_id, "#{_fk_file_id}") > ferret_index.search_each(query) do | id | > ferret_index.delete(id) > end > > And I get the following error > Cannot delete for id of type Array > > > As I see it the only way this could happened is if search_each > returns an Array of ID's but it couldn't right? from the api docs: search_each(query, options = {}) {|doc, score| ...} you see that ferret hands you two arguments into the block. Now if you only accept one parameter, Ruby guesses that you want all parameters as an array. This should yield a warning like 'multiple values for a block parameter (2 for 1)' somewhere in your logs. So just use |id, score| and everything should be fine :-) cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From julioody at gmail.com Wed Mar 21 20:07:57 2007 From: julioody at gmail.com (Julio Cesar Ody) Date: Thu, 22 Mar 2007 11:07:57 +1100 Subject: [Ferret-talk] =?iso-8859-1?q?=22=F6=22_causes_find=5Fby=5Fcontent?= =?iso-8859-1?q?_s_not_to_return?= In-Reply-To: <45f1e91f75cbe157215e77ca22cbe4aa@ruby-forum.com> References: <82b98055afee3bf1922170448d197ed5@ruby-forum.com> <45f1e91f75cbe157215e77ca22cbe4aa@ruby-forum.com> Message-ID: I tried this with an UTF-8 encoded string (japanese): "\u304A\u308C\u3068\u9B5A".unpack("U*").pack("C*") Which gives me this in return: "u304Au308Cu3068u9B5A" And that's not what I want stored in my index, right? Now I'm pretty sure I'm doing something dumb :-) hopefully someone can clarify. Thanks. On 3/22/07, Thomas Senf wrote: > David Balmain wrote: > > > > Unfortunately Ferret doesn't come with UTF-8 support in Windows as the > > win32 runtime environment doesn't seem to support UTF-8. You will > > therefore need to write your own analyzer on Windows if you want to > > support UTF-8 searches. > > > > Hello Star Burger, > > if you're planning to write your own UTF-8 Analyzer consider the > unpack/pack duo: > > utf-8_encoded_string_from_db.unpack("U*").pack("C*") > @index << {:content => utf-8_encoded_string_from_db} > @index.search_each('content:Beh?rde') {|id,score| do_sth} > > I didn't try this in afa, but with ruby it worked in my case. > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- Julio C. Ody http://rootshell.be/~julioody From caleb at inforadical.net Wed Mar 21 20:46:14 2007 From: caleb at inforadical.net (Caleb Clausen) Date: Wed, 21 Mar 2007 17:46:14 -0700 Subject: [Ferret-talk] Cannot delete for id of type Array In-Reply-To: References: Message-ID: <4601D1D6.5060706@inforadical.net> Henrik Zagerholm wrote: > ferret_index = Ferret::Index::Index.new(:path => FERRET_INDEX_PATH) > query = Ferret::Search::TermQuery.new(:fk_file_id, "#{_fk_file_id}") > ferret_index.search_each(query) do | id | > ferret_index.delete(id) > end This way is a little more direct (untested): ferret_index = Ferret::Index::IndexWriter.new(:path =>FERRET_INDEX_PATH) ferret_index.delete(:fk_file_id, _fk_file_id) From henke at mac.se Thu Mar 22 03:50:40 2007 From: henke at mac.se (Henrik Zagerholm) Date: Thu, 22 Mar 2007 08:50:40 +0100 Subject: [Ferret-talk] Cannot delete for id of type Array In-Reply-To: <20070321221719.GA8027@cordoba.webit.de> References: <69B536CE-48D6-4AD4-AD9D-9C6C85BD5106@mac.se> <20070321221719.GA8027@cordoba.webit.de> Message-ID: <197987DC-692F-4EC5-9223-83B0828CA57A@mac.se> 21 mar 2007 kl. 23:17 skrev Jens Kraemer: > On Wed, Mar 21, 2007 at 10:56:42PM +0100, Henrik Zagerholm wrote: >> Hello list, >> >> I have a little weird error when deleting documents from the index. >> >> I'm using the following code. >> >> ferret_index = Ferret::Index::Index.new(:path => FERRET_INDEX_PATH) >> query = Ferret::Search::TermQuery.new(:fk_file_id, "#{_fk_file_id}") >> ferret_index.search_each(query) do | id | >> ferret_index.delete(id) >> end >> >> And I get the following error >> Cannot delete for id of type Array >> >> >> As I see it the only way this could happened is if search_each >> returns an Array of ID's but it couldn't right? > > from the api docs: > search_each(query, options = {}) {|doc, score| ...} > > you see that ferret hands you two arguments into the block. > > Now if you only accept one parameter, Ruby guesses that you want > all parameters as an array. This should yield a warning like > 'multiple values for a block parameter (2 for 1)' > somewhere in your logs. > > So just use |id, score| and everything should be fine :-) So its that ruby magic trying to figure out what I want again. =) I was afraid that only using id and not score could be the problem but I couldn't see why. Now I know! Thanks! > > cheers, > Jens > > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen Malessa > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From henke at mac.se Thu Mar 22 04:01:05 2007 From: henke at mac.se (Henrik Zagerholm) Date: Thu, 22 Mar 2007 09:01:05 +0100 Subject: [Ferret-talk] Cannot delete for id of type Array In-Reply-To: <4601D1D6.5060706@inforadical.net> References: <4601D1D6.5060706@inforadical.net> Message-ID: Does not work. Delete method only takes 1 argument. Cheers 22 mar 2007 kl. 01:46 skrev Caleb Clausen: > Henrik Zagerholm wrote: >> ferret_index = Ferret::Index::Index.new(:path => FERRET_INDEX_PATH) >> query = Ferret::Search::TermQuery.new(:fk_file_id, "#{_fk_file_id}") >> ferret_index.search_each(query) do | id | >> ferret_index.delete(id) >> end > > This way is a little more direct (untested): > > ferret_index = Ferret::Index::IndexWriter.new(:path > =>FERRET_INDEX_PATH) > ferret_index.delete(:fk_file_id, _fk_file_id) > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From kraemer at webit.de Thu Mar 22 05:28:17 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 22 Mar 2007 10:28:17 +0100 Subject: [Ferret-talk] Score more if begins with query In-Reply-To: <2a2907233d22e235e1b13d5b28994e03@ruby-forum.com> References: <2a2907233d22e235e1b13d5b28994e03@ruby-forum.com> Message-ID: <20070322092817.GB30747@cordoba.webit.de> On Wed, Mar 21, 2007 at 05:57:46PM +0100, Linus wrote: > Hi, > I need to score more on products, those names > begin with query, rather then just contain it. > > I am not sure where to start research on that... > Any ideas? you could index the first word of your product names in a separate field, and give that field a boost. So any query with a term that matches the first word of the product name would score higher. however there might well be better solutions to this problem I can't think of right now :-) Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From ahfeel at rift.fr Thu Mar 22 06:09:52 2007 From: ahfeel at rift.fr (ahFeel) Date: Thu, 22 Mar 2007 11:09:52 +0100 Subject: [Ferret-talk] Url searching ? Message-ID: Hi all :) I have many objects with a url field, of course containing standards urls... I'm trying to match them but i actually got problems with that. Here's a little code of what i would like to achieve: require 'rubygems' require 'ferret' require 'ftools' class TestAnalyzer def token_stream(field, str) ts = Ferret::Analysis::AsciiStandardTokenizer.new(str) ts = Ferret::Analysis::AsciiLowerCaseFilter.new(ts) end end system 'rm -rf /tmp/ferret_test' if File.exists?('/tmp/ferret_test') File.mkpath('/tmp/ferret_test') INDEX = Ferret::I.new(:path => '/tmp/ferret_test', :analyzer => TestAnalyzer.new) INDEX << {:type => :url, :url => 'http://google.fr'} INDEX << {:type => :url, :url => 'http://ferret.davebalmain.com'} INDEX << {:type => :url, :url => 'http://www.unixaumonde.com'} INDEX << {:type => :url, :url => 'http://www.rift.fr'} ['type:url AND url:*google*', 'type:url AND url:*"://foobar"*', 'type:url AND url:"http://goo"*', 'type:url AND url:"http://goo*"'].each do |q| puts "\nSearching #{q}" INDEX.search(q).hits.each { |x| p INDEX[x.doc].load } puts "\n" end Thanks by advance ! Regards, Jeremie 'ahFeel' BORDIER Rift Technologies -- Posted via http://www.ruby-forum.com/. From mattias at oncotype.dk Thu Mar 22 11:04:49 2007 From: mattias at oncotype.dk (Mattias Bud) Date: Thu, 22 Mar 2007 16:04:49 +0100 Subject: [Ferret-talk] Make ferret serach for english words like "and" Message-ID: <51dfb6136bcf028f6c34703daacd2068@ruby-forum.com> I use ferret on a danish site where I would like ferret to search word like "under" and "and". These are words that are excluded by ferret but have a different meaning in danish. Can you dissable this in ferret? Cheers Mattias -- Posted via http://www.ruby-forum.com/. From mattias at oncotype.dk Thu Mar 22 11:29:13 2007 From: mattias at oncotype.dk (Mattias Bodlund) Date: Thu, 22 Mar 2007 16:29:13 +0100 Subject: [Ferret-talk] Noice words... Message-ID: Hi I use acts_as_ferret on an app that is in Danish and English. In Danish english words like "and" and "under" has meaning. Is it possible to make ferret search for these words? As it is now a seach for "under" returns nothing even-though I know the word is present in the index. Cheers Mattias From kraemer at webit.de Thu Mar 22 12:05:12 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 22 Mar 2007 17:05:12 +0100 Subject: [Ferret-talk] Noice words... In-Reply-To: References: Message-ID: <20070322160512.GI30747@cordoba.webit.de> On Thu, Mar 22, 2007 at 04:29:13PM +0100, Mattias Bodlund wrote: > Hi > > I use acts_as_ferret on an app that is in Danish and English. In > Danish english words like "and" and "under" has meaning. Is it > possible to make ferret search for these words? As it is now a seach > for "under" returns nothing even-though I know the word is present in > the index. construct your own StandardAnalyzer and specify your custom stop word list (or an empty array for no stop words at all). See http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardAnalyzer.html With aaf you can specify that analyzer like that: class MyModel acts_as_ferret( { :fields => { ... } }, { :analyzer => Ferret::Analysis::StandardAnalyzer.new([]) } ) ... end note the analyzer option goes into the second argument hash (where all options go that are handed through to ferret's Ferret::Index::Index instance). Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Thu Mar 22 12:08:37 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 22 Mar 2007 17:08:37 +0100 Subject: [Ferret-talk] Noice words... In-Reply-To: <20070322160512.GI30747@cordoba.webit.de> References: <20070322160512.GI30747@cordoba.webit.de> Message-ID: <20070322160837.GJ30747@cordoba.webit.de> On Thu, Mar 22, 2007 at 05:05:12PM +0100, Jens Kraemer wrote: > On Thu, Mar 22, 2007 at 04:29:13PM +0100, Mattias Bodlund wrote: > > Hi > > > > I use acts_as_ferret on an app that is in Danish and English. In > > Danish english words like "and" and "under" has meaning. Is it > > possible to make ferret search for these words? As it is now a seach > > for "under" returns nothing even-though I know the word is present in > > the index. > > construct your own StandardAnalyzer and specify your custom stop word list (or > an empty array for no stop words at all). > > See > http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardAnalyzer.html > > With aaf you can specify that analyzer like that: > > class MyModel > acts_as_ferret( { :fields => { ... } }, { :analyzer => Ferret::Analysis::StandardAnalyzer.new([]) } ) > ... > > end > > note the analyzer option goes into the second argument hash (where all > options go that are handed through to ferret's Ferret::Index::Index > instance). one more note - you need to rebuild your index after changing analyzers. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From mattias at oncotype.dk Thu Mar 22 12:22:31 2007 From: mattias at oncotype.dk (Mattias Bodlund) Date: Thu, 22 Mar 2007 17:22:31 +0100 Subject: [Ferret-talk] Noice words... In-Reply-To: <20070322160837.GJ30747@cordoba.webit.de> References: <20070322160512.GI30747@cordoba.webit.de> <20070322160837.GJ30747@cordoba.webit.de> Message-ID: On 22/03/2007, at 17.08, Jens Kraemer wrote: > On Thu, Mar 22, 2007 at 05:05:12PM +0100, Jens Kraemer wrote: >> On Thu, Mar 22, 2007 at 04:29:13PM +0100, Mattias Bodlund wrote: >>> Hi >>> >>> I use acts_as_ferret on an app that is in Danish and English. In >>> Danish english words like "and" and "under" has meaning. Is it >>> possible to make ferret search for these words? As it is now a seach >>> for "under" returns nothing even-though I know the word is >>> present in >>> the index. >> >> construct your own StandardAnalyzer and specify your custom stop >> word list (or >> an empty array for no stop words at all). >> >> See >> http://ferret.davebalmain.com/api/classes/Ferret/Analysis/ >> StandardAnalyzer.html >> >> With aaf you can specify that analyzer like that: >> >> class MyModel >> acts_as_ferret( { :fields => { ... } }, { :analyzer => >> Ferret::Analysis::StandardAnalyzer.new([]) } ) >> ... >> >> end >> >> note the analyzer option goes into the second argument hash (where >> all >> options go that are handed through to ferret's Ferret::Index::Index >> instance). > > one more note - you need to rebuild your index after changing > analyzers. > > Jens > Thanks a lot Jens. I'll integrate and then reindex the thing. mattias From spahl at rift.fr Fri Mar 23 06:10:31 2007 From: spahl at rift.fr (Sebastien Pahl) Date: Fri, 23 Mar 2007 11:10:31 +0100 Subject: [Ferret-talk] Concurrency Problem in 0.11.3 In-Reply-To: <1174331283.4087.5.camel@localhost.localdomain> References: <70b2335f0e8990eb4a8f42ee7d1dcf72@ruby-forum.com> <1174331283.4087.5.camel@localhost.localdomain> Message-ID: Isn't the ferret-write.lck supposed to take care of this problem? John Leach wrote: > ... Ferret does not support multiple processes writing to the same index. ... -- Sebastien Pahl - Rift Technologies spahl at rift.fr -- Posted via http://www.ruby-forum.com/. From laurent.farcy at neuf.fr Fri Mar 23 10:56:18 2007 From: laurent.farcy at neuf.fr (Laurent Lau) Date: Fri, 23 Mar 2007 15:56:18 +0100 Subject: [Ferret-talk] Any chance to get 0.11.3 on windows soon ? Message-ID: <56a3755138984d7014dc85d7095bf90a@ruby-forum.com> Hi, I'm working on a Ferret-based application which indexes content in all European languages. Thus, I have to deal with those funny European characters. After googling a bit, I decided to move on with a custom European analyzer based on MappingFilter, as suggested in the Ferret rdoc. Everything works fine with Ferret 0.11.3 on Mac OS X. But this application needs to run on both Windows and Mac OS X. Since there's no mswin32 gem for 0.11.3, I decided to downgrade to 0.10.9 and replace MappingFilter with a custom-made filter as suggested by David in the following post. http://www.ruby-forum.com/topic/85299#156036 See the code I wrote at the bottom of this post. The token streams produced by this analyzer work fine in unit tests but the indexer fails to use them when a document is added. Here's the stack trace I get (on Mac OS X) wrong argument type Ferret::Analysis::ToASCIIFilter (expected Data) /usr/local/lib/ruby/gems/1.8/gems/ferret-0.10.9/lib/ferret/index.rb:277:in `text=' /usr/local/lib/ruby/gems/1.8/gems/ferret-0.10.9/lib/ferret/index.rb:277:in `add_document' /usr/local/lib/ruby/gems/1.8/gems/ferret-0.10.9/lib/ferret/index.rb:277:in `<<' /usr/local/lib/ruby/1.8/monitor.rb:238:in `synchronize' /usr/local/lib/ruby/gems/1.8/gems/ferret-0.10.9/lib/ferret/index.rb:252:in `<<' I tried several variants of the code (like avoid super and inheritance) but never with success. Therefore, I'm wondering whether 0.11.3 will be available soon on windows. Or if I can build this gem myself (I guess I'll need a Microsoft C compiler). Or if I can do things differently to get a European analyzer with 0.10.9. Thanks for your help. Laurent --------------------------------------------------------------------------- require 'ferret' require 'jcode' module Ferret::Analysis ACCENTUATED_CHARS = '???????????????????????????????????????????????????????????????????????????????????????????????' REPLACEMENT_CHARS = 'aaaaaaaacccccddeeeeeeeeegggghhiiiiiiiijjjjkklllllnnnnnnooooooooooqrrrsssssttttuuuuuuuuuuwyyyzzz' MAPPING = { ['?','?','?','?','?','?','?','?'] => 'a', '?' => 'ae', ['?','?'] => 'd', ['?','?','?','?','?'] => 'c', ['?','?','?','?','?','?','?','?','?'] => 'e', ['?'] => 'f', ['?','?','?','?'] => 'g', ['?','?'] => 'h', ['?','?','?','?','?','?','?','?'] => 'i', ['?','?','?','?'] => 'j', ['?','?'] => 'k', ['?','?','?','?','?'] => 'l', ['?','?','?','?','?','?'] => 'n', ['?','?','?','?','?','?','?','?','?','?'] => 'o', ['?'] => 'oek', ['?'] => 'q', ['?','?','?'] => 'r', ['?','?','?','?','?'] => 's', ['?','?','?','?'] => 't', ['?','?','?','?','?','?','?','?','?','?'] => 'u', ['?'] => 'w', ['?','?','?'] => 'y', ['?','?','?'] => 'z' } class TokenFilter < TokenStream # Construct a token stream filtering the given input. def initialize(input) @input = input end end # replace accentuated chars with ASCII one class ToASCIIFilter < TokenFilter def next() token = @input.next() unless token.nil? token.text = token.text.tr(ACCENTUATED_CHARS, REPLACEMENT_CHARS) end token end end class EuropeanAnalyzer < StandardAnalyzer def token_stream(field, string) if defined?(MappingFilter) return MappingFilter.new(super, MAPPING) # 0.11.x else return ToASCIIFilter.new(super) # 0.10.x end end end end -- Posted via http://www.ruby-forum.com/. From spahl at rift.fr Fri Mar 23 12:12:01 2007 From: spahl at rift.fr (Sebastien Pahl) Date: Fri, 23 Mar 2007 17:12:01 +0100 Subject: [Ferret-talk] Multiple servers for one index Message-ID: <69969158b7928093d5a45ab31117a812@ruby-forum.com> Hi, I'm currently trying to set up a solution involving multiple servers using the same index over nfs. The problem is that from what I have seen, ferret doesn't support multiple processes writing to the same index. Using a DRb service is not an option since this would create a single point of failure. I tried using Ferret::Store::FSDirectory to create a write lock on the the index directory with code somewhat like this one: [...] dir = Ferret::Store::FSDirectory.new(INDEX_PATH) write_lock = dir.make_lock("lock") write_lock.obtain index << {:id => id, :type => 'create_test_type'}\ index.flush write_lock.release [...] but it makes the processes freezes or raise a Ferret::Store::Lock::LockError in my different attempts. I tried to play with IndexWriter options like max_merge_docs, merge_factor... but without success. Maybe there is a way to merge all the Compound files every couple of writes instead of doing it on the fly. Is there a way to achieve my goal? Dave please tell me you have an idea:-P Thanks Seb -- Sebastien Pahl - Rift Technologies spahl at rift.fr -- Posted via http://www.ruby-forum.com/. From lajanus at o2.pl Fri Mar 23 13:08:08 2007 From: lajanus at o2.pl (Linus) Date: Fri, 23 Mar 2007 18:08:08 +0100 Subject: [Ferret-talk] Score more if begins with query In-Reply-To: <20070322092817.GB30747@cordoba.webit.de> References: <2a2907233d22e235e1b13d5b28994e03@ruby-forum.com> <20070322092817.GB30747@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > you could index the first word of your product names in a separate > field, and give that field a boost. So any query with a term that > matches the first word of the product name would score higher. > > however there might well be better solutions to this problem I can't > think of right now :-) I finished with this in controller: q = "#{q}*^4 OR *#{q}*" It seems to do the job :) Best regards, Tom -- Posted via http://www.ruby-forum.com/. From marvin at rectangular.com Fri Mar 23 13:05:36 2007 From: marvin at rectangular.com (Marvin Humphrey) Date: Fri, 23 Mar 2007 10:05:36 -0700 Subject: [Ferret-talk] Multiple servers for one index In-Reply-To: <69969158b7928093d5a45ab31117a812@ruby-forum.com> References: <69969158b7928093d5a45ab31117a812@ruby-forum.com> Message-ID: <15FA3654-3A78-4B4C-96BC-20285A026846@rectangular.com> On Mar 23, 2007, at 9:12 AM, Sebastien Pahl wrote: > \Dave please tell me you have an idea:-P Dave, I recently more-or-less solved the NFS problem in KinoSearch. The gist of the solution is to implement read-locking on IndexReaders via lock files, but leave it off by default -- so that only people who put their indexes on NFS need turn it on. More info in the "Read- locking on shared volumes" section here: http://xrl.us/vfs2 (Link to www.rectangular.com) Marvin Humphrey Rectangular Research http://www.rectangular.com/ From matt at mattschnitz.com Fri Mar 23 13:46:43 2007 From: matt at mattschnitz.com (Matt Schnitz) Date: Fri, 23 Mar 2007 10:46:43 -0700 Subject: [Ferret-talk] Multiple servers for one index In-Reply-To: <15FA3654-3A78-4B4C-96BC-20285A026846@rectangular.com> References: <69969158b7928093d5a45ab31117a812@ruby-forum.com> <15FA3654-3A78-4B4C-96BC-20285A026846@rectangular.com> Message-ID: <497cc4a0703231046n1383b125q59d78768663017d3@mail.gmail.com> I personally would love some support for multi-threaded write locking, built-in. It's pretty easy this days to set up a multithreaded Rails/Ferret server using Mongrel and Lighttpd. It'd also be nice if the docs gave special warning for this case. It came pretty unexpectedly. Schnitz On 3/23/07, Marvin Humphrey wrote: > > > On Mar 23, 2007, at 9:12 AM, Sebastien Pahl wrote: > > > \Dave please tell me you have an idea:-P > > Dave, I recently more-or-less solved the NFS problem in KinoSearch. > The gist of the solution is to implement read-locking on IndexReaders > via lock files, but leave it off by default -- so that only people > who put their indexes on NFS need turn it on. More info in the "Read- > locking on shared volumes" section here: > > http://xrl.us/vfs2 (Link to www.rectangular.com) > > Marvin Humphrey > Rectangular Research > http://www.rectangular.com/ > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070323/18de6b63/attachment.html From marvin at rectangular.com Fri Mar 23 14:36:34 2007 From: marvin at rectangular.com (Marvin Humphrey) Date: Fri, 23 Mar 2007 11:36:34 -0700 Subject: [Ferret-talk] Multiple servers for one index In-Reply-To: <497cc4a0703231046n1383b125q59d78768663017d3@mail.gmail.com> References: <69969158b7928093d5a45ab31117a812@ruby-forum.com> <15FA3654-3A78-4B4C-96BC-20285A026846@rectangular.com> <497cc4a0703231046n1383b125q59d78768663017d3@mail.gmail.com> Message-ID: <188FF43D-F29E-4F17-9F04-941E680A3561@rectangular.com> On Mar 23, 2007, at 10:46 AM, Matt Schnitz wrote: > I personally would love some support for multi-threaded write > locking, built-in. It's pretty easy this days to set up a > multithreaded Rails/Ferret server using Mongrel and Lighttpd. I'm not sure if Dave's solved a problem that neither Lucene nor KinoSearch has solved, but I'd say it's difficult to outright impossible to allow more than one write process access to the index at any given moment under the segmented, write-once model used by all of us. What is possible is to manage access to an index on a shared volume so that an active write process causes all other attempts to open a write process to fail, including those from other machines. The key is to put the write.lock file in the index directory, rather than in the temp directory -- since the temp directory is per-machine, no other machine knows about another machine's lock files and write processes may stomp each other. I believe the default location of the lock directory was changed in Lucene in 2.1 (if not the change is in svn trunk). It changed in KinoSearch as of 0.20_01, though with a twist that makes things more convenient for everyone else at a minor cost to NFS users: Concurrency Only one InvIndexer may write to an invindex at a time. If a write lock cannot be secured, new() will throw an exception. If your an index is located on a shared volume, each writer application must identify itself by passing a LockFactory to InvIndexer's constructor, or index corruption will occur. Imposing that condition means that stale lock files associated with dead pids can be zapped automatically by default. In earlier versions of Lucene, it's possible to specify a global lock dir location, putting it on the shared volume for example and allowing multiple machines to become aware of each other's lock files. It wouldn't surprise me if Dave had duplicated that in Ferret. > It'd also be nice if the docs gave special warning for this case. > It came pretty unexpectedly. NFS is bleedin' PITA to support because it doesn't do "delete-on-last- close" and flock/fcntl locking is unreliable on so many operating systems. What I'd really like to do is detect NFS somehow and throw errors at construction time, but since that's not realistic, there are moderately prominent warnings now in the KS docs. It's not an ideal set-up because inevitably some fraction of users will get burned when they move their indexes to NFS without taking stock of the warnings, but without getting into the gory details, I'll just say that's hard to avoid. Marvin Humphrey Rectangular Research http://www.rectangular.com/ From spahl at rift.fr Fri Mar 23 16:54:30 2007 From: spahl at rift.fr (Sebastien Pahl) Date: Fri, 23 Mar 2007 21:54:30 +0100 Subject: [Ferret-talk] Multiple servers for one index In-Reply-To: <188FF43D-F29E-4F17-9F04-941E680A3561@rectangular.com> References: <69969158b7928093d5a45ab31117a812@ruby-forum.com> <15FA3654-3A78-4B4C-96BC-20285A026846@rectangular.com> <497cc4a0703231046n1383b125q59d78768663017d3@mail.gmail.com> <188FF43D-F29E-4F17-9F04-941E680A3561@rectangular.com> Message-ID: That is exactly what I tried with Ferret but it makes the processes freeze or raise a Ferret::Store::Lock::LockError. Marvin Humphrey wrote: > On Mar 23, 2007, at 10:46 AM, Matt Schnitz wrote: > > > What is possible is to manage access to an index on a shared volume > so that an active write process causes all other attempts to open a > write process to fail, including those from other machines. The key > is to put the write.lock file in the index directory, rather than in > the temp directory -- since the temp directory is per-machine, no > other machine knows about another machine's lock files and write > processes may stomp each other. > -- Sebastien Pahl - Rift Technologies spahl at rift.fr -- Posted via http://www.ruby-forum.com/. From marvin at rectangular.com Fri Mar 23 17:18:27 2007 From: marvin at rectangular.com (Marvin Humphrey) Date: Fri, 23 Mar 2007 14:18:27 -0700 Subject: [Ferret-talk] Multiple servers for one index In-Reply-To: References: <69969158b7928093d5a45ab31117a812@ruby-forum.com> <15FA3654-3A78-4B4C-96BC-20285A026846@rectangular.com> <497cc4a0703231046n1383b125q59d78768663017d3@mail.gmail.com> <188FF43D-F29E-4F17-9F04-941E680A3561@rectangular.com> Message-ID: <07DF6764-0DF2-4E2C-9CEC-71D8F55B4258@rectangular.com> On Mar 23, 2007, at 1:54 PM, Sebastien Pahl wrote: > That is exactly what I tried with Ferret but it makes the processes > freeze or raise a Ferret::Store::Lock::LockError. I'm less than completely familiar with how Ferret handles this, but in KS, you'll get a lock error after the timeout is exceeded and it stops retrying. A freeze sounds wrong. I suspect the only way to make this work is to catch the LockError and retry. Creating a queue for writers trying to access an NFS index, so that each new process starts immediately after an old process releases a lock... that would be great, but I don't know how you'd pull it off with lock files. Creating shared read locks was hard enough! Marvin Humphrey Rectangular Research http://www.rectangular.com/ From no at spam.thanks Fri Mar 23 18:35:20 2007 From: no at spam.thanks (david) Date: Fri, 23 Mar 2007 23:35:20 +0100 Subject: [Ferret-talk] how to select only some fields? Message-ID: <196be902fa2b455da8e6f40ef9c1bc34@ruby-forum.com> hi, how can i select only some fields when ferret do the query? like a :select => 'id', for the find....i've tried to do it where i do the :include => [:something], but it doesn't work...is it right or i've to put it in another place? -- Posted via http://www.ruby-forum.com/. From ror_dave at yahoo.com Fri Mar 23 23:30:19 2007 From: ror_dave at yahoo.com (dave developer) Date: Fri, 23 Mar 2007 23:30:19 -0400 (EDT) Subject: [Ferret-talk] win32 11.1-rc2 ferret gem available? Message-ID: <840662.71529.qm@web63204.mail.re1.yahoo.com> Hello! I just wanted to follow up on this previous post of mine, we've noticed that there have been several updates to the gem. I have two quick questions. Is there a new win32 gem (since v10.9) on its way -- if so, when? and 2. If we continue to develop with our v10.9 gem on our windows XP laptops and then deploy to a unix environment that' would be using the latest gem (11.3+), should all of our syntax, etc work (I understand that the indices would be built differently, but that's fine)? Thanks again! Dave ----- Original Message ---- From: dave developer To: ferret-talk at rubyforge.org Sent: Wednesday, February 28, 2007 11:24:12 AM Subject: Re: [Ferret-talk] win32 11.1-rc2 ferret gem available? Thanks for the quick response, Dave. I appreciate it -- it saved us tons of time with some potential debugging/environment setups. I'll let anyone know if we run into any errors while testing the new win32 gem upon its release as well. Thanks! ----- Original Message ---- From: David Balmain To: ferret-talk at rubyforge.org Sent: Wednesday, February 28, 2007 2:32:34 AM Subject: Re: [Ferret-talk] win32 11.1-rc2 ferret gem available? On 2/28/07, dave developer wrote: > Hello! > > I am developing a rails application in a test environment on Windows XP > and recently upgraded to InstantRails 1.5. During the upgrade process, I > went to install the latest ferret gem and realized that the latest gem > compiled for Windows was 10.9. We upgraded our production environment > (Ubuntu) to 11.1-rc2 successfully, but wanted to find out if we were going > to run into any problems conducting our testing with 10.9 and deploying into > the 11.1-rc2 production environment. If we were, is there anyone that has > compiled a win32 binary of the 11.1-rc2 gem? Sorry, I didn't really fully answer your question. There are no major API changes between the two versions you are using so if your app is working with 0.10.9 then it should work with 0.11.2. There are however differences in the index file format so you can't copy the index across and 0.10.9 has a lot more bugs than 0.11.2. But for the short term you should be fine. -- Dave Balmain http://www.davebalmain.com/ _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk Don't be flakey. Get Yahoo! Mail for Mobile and always stay connected to friends._______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk ____________________________________________________________________________________ 8:00? 8:25? 8:40? Find a flick in no time with the Yahoo! Search movie showtime shortcut. http://tools.search.yahoo.com/shortcuts/#news -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070323/2b1839b4/attachment-0001.html From akaspick at gmail.com Sat Mar 24 01:35:04 2007 From: akaspick at gmail.com (Andrew Kaspick) Date: Sat, 24 Mar 2007 06:35:04 +0100 Subject: [Ferret-talk] getting fuzzy search to work Message-ID: Hello. Forgive my ignorance if this has been covered. I searched through the archives and didn't find anything. All the blogs, tutorials, documentation I'm reading on acts_as_ferret say that to use fuzzy searching, I simply append a ~ to my query. I'm using Ferret 0.11.3 and the newest (should be) acts_as_ferret plugin. I can only get perfect matches to show up using a query such as... Blog.find_by_contents("test"), which returns results just fine. as soon as I add the ~ Blog.find_by_contents("test~"), I get nothing. If I use wildcard syntax... Blog.find_by_contents("*test*"), I get results again. I could be interpreting what I'm reading wrong or I have something misconfigured, but I'm confused by this. Can somebody point me in the right direction? Thanks, Andrew -- Posted via http://www.ruby-forum.com/. From john at johnleach.co.uk Sat Mar 24 20:33:48 2007 From: john at johnleach.co.uk (John Leach) Date: Sun, 25 Mar 2007 00:33:48 +0000 Subject: [Ferret-talk] win32 11.1-rc2 ferret gem available? In-Reply-To: <840662.71529.qm@web63204.mail.re1.yahoo.com> References: <840662.71529.qm@web63204.mail.re1.yahoo.com> Message-ID: <1174782828.32061.21.camel@localhost.localdomain> So far all my 0.10 code has worked with 0.11 fine. from the announcement for 0.11.0: "I've just released Ferret 0.11.0 which is the first release candidate for Ferret 1.0. This release has no new features to the API but it does fix some very major bugs." "Some of these fixes mean that the current version of Ferret is not backwards compatible. If you install the latest version you will need to rebuild your index from scratch." John. On Fri, 2007-03-23 at 23:30 -0400, dave developer wrote: > Hello! > > I just wanted to follow up on this previous post of mine, we've > noticed that there have been several updates to the gem. I have two > quick questions. Is there a new win32 gem (since v10.9) on its way -- > if so, when? and 2. If we continue to develop with our v10.9 gem on > our windows XP laptops and then deploy to a unix environment that' > would be using the latest gem (11.3+), should all of our syntax, etc > work (I understand that the indices would be built differently, but > that's fine)? -- http://johnleach.co.uk From none at gmail.com Sun Mar 25 03:04:16 2007 From: none at gmail.com (mixplate) Date: Sun, 25 Mar 2007 09:04:16 +0200 Subject: [Ferret-talk] kind of stuck with new version of ferret gem Message-ID: <68dd51133fb61a03b13398a30779e5d9@ruby-forum.com> .... #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:201:in `send' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:201:in `rebuild_index' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:198:in `rebuild_index' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:197:in `rebuild_index' /usr/local/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/connection_adapters/abstract/database_statements.rb:59:in `transaction' /usr/local/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/transactions.rb:95:in `transaction' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:196:in `rebuild_index' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:194:in `rebuild_index' #{RAILS_ROOT}/app/controllers/tristate_controller.rb:13:in `index' what am i missing? ive read that i may need to rebuild my old indexeses from previous version of ferret since they get corrupted during upgrades? do i just do mode.rebuild_index or do i delete it? -- Posted via http://www.ruby-forum.com/. From none at gmail.com Sun Mar 25 03:08:28 2007 From: none at gmail.com (mixplate) Date: Sun, 25 Mar 2007 09:08:28 +0200 Subject: [Ferret-talk] kind of stuck with new version of ferret gem In-Reply-To: <68dd51133fb61a03b13398a30779e5d9@ruby-forum.com> References: <68dd51133fb61a03b13398a30779e5d9@ruby-forum.com> Message-ID: nevermind fixed it..added the mssing code to the methods file. google is awesome! -- Posted via http://www.ruby-forum.com/. From senser.simon at gmail.com Mon Mar 26 00:56:36 2007 From: senser.simon at gmail.com (Jin) Date: Mon, 26 Mar 2007 06:56:36 +0200 Subject: [Ferret-talk] getting fuzzy search to work In-Reply-To: References: Message-ID: seems we got the same question i'l keep my eyes on this post -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Mar 26 03:48:48 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 26 Mar 2007 09:48:48 +0200 Subject: [Ferret-talk] [AAF] acts_as_ferret 0.4.0 released Message-ID: <20070326074848.GL30747@cordoba.webit.de> Hi folks! Just wanted to let you know that I released aaf 0.4.0 on last weekend. Besides the DRb server it also includes a new lazy loading feature that lets you do ferret searches without actually loading any records from the DB. Useful e.g. for live searches: model: class MyModel acts_as_ferret :fields => { :title => { :store => :yes }, :content => {} } end controller: results = MyModel.find_by_contents(query, :lazy => true) you can use the result objects as you would your AR records, if you query any attribute not stored in the index, the whole record will be loaded from the database. So, as long as you only access record.id and record.title in the example, no DB call will be made. I'll post some more documentation about this soon. For now, there's some more info about the DRb server on my blog: http://www.jkraemer.net/2007/3/24/acts_as_ferret-0-4-0-rie -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From senser.simon at gmail.com Mon Mar 26 04:54:12 2007 From: senser.simon at gmail.com (Jin) Date: Mon, 26 Mar 2007 10:54:12 +0200 Subject: [Ferret-talk] [AAF] acts_as_ferret 0.4.0 released In-Reply-To: <20070326074848.GL30747@cordoba.webit.de> References: <20070326074848.GL30747@cordoba.webit.de> Message-ID: cheers for new release but still got some questions about the mechanizm of aaf my task of search function has to save the tag words in table just like a_a_taggable because customer need hot tags function i wanna know how could i use database instead of file system to store index or tags -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Mar 26 08:01:50 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 26 Mar 2007 14:01:50 +0200 Subject: [Ferret-talk] [AAF] acts_as_ferret 0.4.0 released In-Reply-To: References: <20070326074848.GL30747@cordoba.webit.de> Message-ID: <20070326120150.GM30747@cordoba.webit.de> On Mon, Mar 26, 2007 at 10:54:12AM +0200, Jin wrote: > cheers for new release > but still got some questions about the mechanizm of aaf > my task of search function has to save the tag words in table just like > a_a_taggable because customer need hot tags function > i wanna know how could i use database instead of file system to store > index or tags I'm not sure I understand what you want to achieve, but I'll try... if you want to search your tags with ferret, just define a function that returns a string with all tags of your object, and index that function by simply specifying it's name in the field list: class Model acts_as_ferret :fields => [ :name, :tag_string ] def tag_string tags.map(&:name).join(' ') end end does that answer your question? Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Mon Mar 26 08:04:01 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 26 Mar 2007 14:04:01 +0200 Subject: [Ferret-talk] how to select only some fields? In-Reply-To: <196be902fa2b455da8e6f40ef9c1bc34@ruby-forum.com> References: <196be902fa2b455da8e6f40ef9c1bc34@ruby-forum.com> Message-ID: <20070326120401.GN30747@cordoba.webit.de> On Fri, Mar 23, 2007 at 11:35:20PM +0100, david wrote: > hi, how can i select only some fields when ferret do the query? like a > :select => 'id', for the find....i've tried to do it where i do the > :include => [:something], but it doesn't work...is it right or i've to > put it in another place? the right place is the second argument hash of find_by_contents: Model.find_by_contents(query, {}, { :select => 'id' }) I didn't ever try with select but if it's a valid option to find :all, it should work. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From andreas.korth at gmx.net Mon Mar 26 11:20:47 2007 From: andreas.korth at gmx.net (Andreas Korth) Date: Mon, 26 Mar 2007 17:20:47 +0200 Subject: [Ferret-talk] Multiple servers for one index In-Reply-To: <69969158b7928093d5a45ab31117a812@ruby-forum.com> References: <69969158b7928093d5a45ab31117a812@ruby-forum.com> Message-ID: On Mar 23, 2007, at 5:12 PM, Sebastien Pahl wrote: > Hi, > > I'm currently trying to set up a solution involving multiple servers > using the same index over nfs. > The problem is that from what I have seen, ferret doesn't support > multiple processes writing to the same index. > > Using a DRb service is not an option since this would create a single > point of failure. Did I miss something or is your NFS volume exactly that: a single point of failure. I think you ruled out the DRb solution too quickly. Shared resources on NFS volumes are always prone to failure. Plus it doesn't scale well because too many processes accessing the index directory will inevitably lead to poor performance or a complete deadlock. I've come to the conclusion that the "Share Nothing" approach works best and SOAs are the way to go. I prefer talking to a single index server and not worry about the details. I don't care whether it is a single server or a load balanced cluster that services my request. -- Andy From no at spam.thanks Mon Mar 26 12:09:41 2007 From: no at spam.thanks (david) Date: Mon, 26 Mar 2007 18:09:41 +0200 Subject: [Ferret-talk] how to select only some fields? In-Reply-To: <20070326120401.GN30747@cordoba.webit.de> References: <196be902fa2b455da8e6f40ef9c1bc34@ruby-forum.com> <20070326120401.GN30747@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > > the right place is the second argument hash of find_by_contents: > > Model.find_by_contents(query, {}, { :select => 'id' }) > I didn't ever try with select but if it's a valid option to find :all, > it should work. > > Jens > i tried with this: self.find_by_contents(query, options, {:select => 'users.nick', :include => [:user, :categories]}) and it doesn't work.. :( -- Posted via http://www.ruby-forum.com/. From senser.simon at gmail.com Mon Mar 26 22:14:37 2007 From: senser.simon at gmail.com (Jin) Date: Tue, 27 Mar 2007 04:14:37 +0200 Subject: [Ferret-talk] [AAF] acts_as_ferret 0.4.0 released In-Reply-To: <20070326120150.GM30747@cordoba.webit.de> References: <20070326074848.GL30747@cordoba.webit.de> <20070326120150.GM30747@cordoba.webit.de> Message-ID: <6f6b7aaf990e5531b5f136873f214a69@ruby-forum.com> thank you jens,I saw ur reply both sides for this hot tags function that is if user search 'demo1' with high frequency and search 'pika' with low frequency then when we wanna make a rank for them we can know which is used high frequency for example,we can list the top ten tags but not all and i found in our aaf has some specific fields such like ferret_score,ferret_rank is that useful for this case? looking forward to ur response -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue Mar 27 04:29:24 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 27 Mar 2007 10:29:24 +0200 Subject: [Ferret-talk] how to select only some fields? In-Reply-To: References: <196be902fa2b455da8e6f40ef9c1bc34@ruby-forum.com> <20070326120401.GN30747@cordoba.webit.de> Message-ID: <20070327082924.GR30747@cordoba.webit.de> On Mon, Mar 26, 2007 at 06:09:41PM +0200, david wrote: > Jens Kraemer wrote: > > > > the right place is the second argument hash of find_by_contents: > > > > Model.find_by_contents(query, {}, { :select => 'id' }) > > I didn't ever try with select but if it's a valid option to find :all, > > it should work. > > > > Jens > > > > i tried with this: self.find_by_contents(query, options, {:select => > 'users.nick', :include => [:user, :categories]}) and it doesn't work.. I just tried this myself and you're right, it doesn't work - that is, all fields get selected. If you try this with plain active record's find method, you'll see that even then your :select option will be ignored when an :include option is given. So unfortunately there's nothing acts_as_ferret can do about this. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Tue Mar 27 04:40:32 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 27 Mar 2007 10:40:32 +0200 Subject: [Ferret-talk] [AAF] acts_as_ferret 0.4.0 released In-Reply-To: <6f6b7aaf990e5531b5f136873f214a69@ruby-forum.com> References: <20070326074848.GL30747@cordoba.webit.de> <20070326120150.GM30747@cordoba.webit.de> <6f6b7aaf990e5531b5f136873f214a69@ruby-forum.com> Message-ID: <20070327084032.GS30747@cordoba.webit.de> On Tue, Mar 27, 2007 at 04:14:37AM +0200, Jin wrote: > thank you jens,I saw ur reply both sides > for this hot tags function that is if user search 'demo1' with high > frequency > and search 'pika' with low frequency then when we wanna make a rank for > them > we can know which is used high frequency > for example,we can list the top ten tags but not all > > and i found in our aaf has some specific fields such like > ferret_score,ferret_rank > is that useful for this case? No, I don't think so. These fields get set by acts_as_ferret for the results of a query - score is the score as computed by ferret, and rank the position of the result in the result set, as delivered by ferret (the order when sorting by rank may be different from the one when sorting by score if you told ferret to sort results by one or more fields). To find out the most used tags I'd probably just use the database, if you're using acts_as_taggable or something similar this should be easy to do. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Tue Mar 27 04:53:50 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 27 Mar 2007 10:53:50 +0200 Subject: [Ferret-talk] Multiple servers for one index In-Reply-To: References: <69969158b7928093d5a45ab31117a812@ruby-forum.com> Message-ID: <20070327085350.GT30747@cordoba.webit.de> On Mon, Mar 26, 2007 at 05:20:47PM +0200, Andreas Korth wrote: > > On Mar 23, 2007, at 5:12 PM, Sebastien Pahl wrote: > > > Hi, > > > > I'm currently trying to set up a solution involving multiple servers > > using the same index over nfs. > > The problem is that from what I have seen, ferret doesn't support > > multiple processes writing to the same index. > > > > Using a DRb service is not an option since this would create a single > > point of failure. > > Did I miss something or is your NFS volume exactly that: a single > point of failure. I think you ruled out the DRb solution too quickly. > Shared resources on NFS volumes are always prone to failure. Plus it > doesn't scale well because too many processes accessing the index > directory will inevitably lead to poor performance or a complete > deadlock. > > I've come to the conclusion that the "Share Nothing" approach works > best and SOAs are the way to go. I prefer talking to a single index > server and not worry about the details. I don't care whether it is a > single server or a load balanced cluster that services my request. Full ack :-) I don't know how big you expect your index to grow, and how critical it is that it's always up-to-date, but wouldn't it be sufficient to have a backup system with a nightly snapshot of the index, that could jump in in case the production server fails? You even could run continuous rebuilds on that backup server to keep the index fairly in sync... Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From senser.simon at gmail.com Tue Mar 27 08:36:21 2007 From: senser.simon at gmail.com (Jin) Date: Tue, 27 Mar 2007 14:36:21 +0200 Subject: [Ferret-talk] [AAF] acts_as_ferret 0.4.0 released In-Reply-To: <20070327084032.GS30747@cordoba.webit.de> References: <20070326074848.GL30747@cordoba.webit.de> <20070326120150.GM30747@cordoba.webit.de> <6f6b7aaf990e5531b5f136873f214a69@ruby-forum.com> <20070327084032.GS30747@cordoba.webit.de> Message-ID: <0dc8dea7e9caae6cbfe6c857df8f253f@ruby-forum.com> Jens Kraemer wrote: > On Tue, Mar 27, 2007 at 04:14:37AM +0200, Jin wrote: >> is that useful for this case? > No, I don't think so. These fields get set by acts_as_ferret for the > results of a query - score is the score as computed by ferret, and rank > the position of the result in the result set, as delivered by ferret > (the order when sorting by rank may be different from the one when > sorting by score if you told ferret to sort results by one or more > fields). > > To find out the most used tags I'd probably just use the database, if > you're using acts_as_taggable or something similar this should be easy > to do. > > Jens > Thank you,Jens After gaining some I think i have got the way to the final solution tomorrow our project will kick off,thank you I will make a strong search function with ur help :) -- Posted via http://www.ruby-forum.com/. From no at spam.thanks Tue Mar 27 08:39:52 2007 From: no at spam.thanks (david) Date: Tue, 27 Mar 2007 14:39:52 +0200 Subject: [Ferret-talk] how to select only some fields? In-Reply-To: <20070327082924.GR30747@cordoba.webit.de> References: <196be902fa2b455da8e6f40ef9c1bc34@ruby-forum.com> <20070326120401.GN30747@cordoba.webit.de> <20070327082924.GR30747@cordoba.webit.de> Message-ID: <71f071448709e8c3833d07d6908b5809@ruby-forum.com> Jens Kraemer wrote: > I just tried this myself and you're right, it doesn't work - that is, > all fields get selected. > > If you try this with plain active record's find method, you'll see that > even then your :select option will be ignored when an :include option is > given. So unfortunately there's nothing acts_as_ferret can do about > this. > so i should use :join instead of :include? :joins: An SQL fragment for additional joins like "LEFT JOIN comments ON comments.post_id = id". (Rarely needed). The records will be returned read-only since they will have attributes that do not correspond to the table?s columns. Pass :readonly => false to override. i don't need to edit them, just to show them, so a simple :joins it can be ok, without :readonly => false.... i'll try :) -- Posted via http://www.ruby-forum.com/. From laurent.farcy at neuf.fr Tue Mar 27 09:57:15 2007 From: laurent.farcy at neuf.fr (Laurent Lau) Date: Tue, 27 Mar 2007 15:57:15 +0200 Subject: [Ferret-talk] Any chance to get 0.11.3 on windows soon ? In-Reply-To: <56a3755138984d7014dc85d7095bf90a@ruby-forum.com> References: <56a3755138984d7014dc85d7095bf90a@ruby-forum.com> Message-ID: Nobody indexing european content on windows with 0.10.x ? :( Laurent -- Posted via http://www.ruby-forum.com/. From tudor_prisacariu at yahoo.com Tue Mar 27 12:36:19 2007 From: tudor_prisacariu at yahoo.com (Tudor) Date: Tue, 27 Mar 2007 18:36:19 +0200 Subject: [Ferret-talk] multi_search problems Message-ID: <19e7b2a10990f1df5f94b2bd0ed4484b@ruby-forum.com> Hello. I've been trying to get multi_search to work and I simply can't. I have two models: Post acts_as_ferret :fields => [:title, :body], :store_class_name => true Page acts_as_ferret :fields => [:title, :body], :store_class_name => true If I do @results = Post.find_by_contents(params[:q]) or @results = Page.find_by_contents(params[:q]) it works fine, but if I try to search in both models @results = Page.multi_search(params[:q],[Post]) I get the following error: nil is not a symbol #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:438:in `const_get' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:438:in `multi_search' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:460:in `id_multi_search' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/multi_index.rb:28:in `search_each' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/multi_index.rb:28:in `search_each' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:457:in `id_multi_search' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/class_methods.rb:437:in `multi_search' #{RAILS_ROOT}/app/controllers/site_controller.rb:121:in `search' -e:3:in `load' -e:3 The interesting part is that this only happens whenever it also finds a Page that maches the query. If it only finds Posts there's no errors. Post is part of a has many through relation so my guess is that this could something to do with it but don't know what. The indexes are ok for both models. I'm using ferret-0.10.9-mswin32 and the latest version of the acts_as_ferret plugin. Any help would be greatly appreciated! -- Posted via http://www.ruby-forum.com/. From veeraa2003 at yahoo.co.in Tue Mar 27 14:33:50 2007 From: veeraa2003 at yahoo.co.in (Veera Sundaravel) Date: Tue, 27 Mar 2007 20:33:50 +0200 Subject: [Ferret-talk] Error in installling gem install acts_as_ferret Message-ID: <8cc8a70f174f79d10bf86b111fad9d59@ruby-forum.com> Hello everybody! I try to install ferret gem in my local machine . It goes like this : [root at dhcppc2 ~]# gem install ferret Need to update 1 gems from http://gems.rubyforge.org . complete Select which gem to install for your platform (i686-linux) 1. ferret 0.11.3 (ruby) 2. ferret 0.11.2 (ruby) 3. ferret 0.11.1 (ruby) >>1 Building native extensions. This could take a while... ruby extconf.rb install ferret creating Makefile make make install /usr/bin/install -c -m 0755 ferret_ext.so /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib make clean Successfully installed ferret-0.11.3 Installing ri documentation for ferret-0.11.3... Installing RDoc documentation for ferret-0.11.3... But when I try to install acts_as_ferret : [root at dhcppc2 ~]# gem install acts_as_ferret Bulk updating Gem source index for: http://gems.rubyforge.org Successfully installed acts_as_ferret-0.4.0 Installing ri documentation for acts_as_ferret-0.4.0... lib/index.rb:23:41: ':' not followed by identified or operator Installing RDoc documentation for acts_as_ferret-0.4.0... lib/index.rb:23:41: ':' not followed by identified or operator [root at dhcppc2 ~]# Here I want to know whether both ferret and act_as_ferret installed corretly or not. Please let me explain whether I have to make any changes in my environment.rb file to enable this ferret. Thanks with Regards, Veeraa -- Posted via http://www.ruby-forum.com/. From tudor_prisacariu at yahoo.com Tue Mar 27 16:30:38 2007 From: tudor_prisacariu at yahoo.com (Tudor) Date: Tue, 27 Mar 2007 22:30:38 +0200 Subject: [Ferret-talk] multi_search problems In-Reply-To: <19e7b2a10990f1df5f94b2bd0ed4484b@ruby-forum.com> References: <19e7b2a10990f1df5f94b2bd0ed4484b@ruby-forum.com> Message-ID: Right, I realized that I didn't have the latest version installed. I installed from the svn now and everything seems to be working ok now. Joy :) -- Posted via http://www.ruby-forum.com/. From senser.simon at gmail.com Wed Mar 28 00:21:47 2007 From: senser.simon at gmail.com (Jin) Date: Wed, 28 Mar 2007 06:21:47 +0200 Subject: [Ferret-talk] [AAF] acts_as_ferret 0.4.0 released In-Reply-To: <0dc8dea7e9caae6cbfe6c857df8f253f@ruby-forum.com> References: <20070326074848.GL30747@cordoba.webit.de> <20070326120150.GM30747@cordoba.webit.de> <6f6b7aaf990e5531b5f136873f214a69@ruby-forum.com> <20070327084032.GS30747@cordoba.webit.de> <0dc8dea7e9caae6cbfe6c857df8f253f@ruby-forum.com> Message-ID: sorry I was still gainning in the mud of search because now i create 3 table members,tags,taggings these last two tables were the nearly same as aataggable,i use them to store specific information the members table i use it for storing member information the keywords is come from the page only stored in the tag table so that mean keywords is not stored in the member fields but only in the tag table if it is in sql we found tag first then taggings then related records 1 requirement really chanllenged me just for user need display search result just like this way if it is a,b,c these 3 words then the result must be records which are satified condition 1. a and b and c first 2.then a and b, b and c, a and c--except the records are existed in above 3.the last are a, b, c--except the records are existed in above any suggestion? -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Mar 28 04:37:04 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 28 Mar 2007 10:37:04 +0200 Subject: [Ferret-talk] Error in installling gem install acts_as_ferret In-Reply-To: <8cc8a70f174f79d10bf86b111fad9d59@ruby-forum.com> References: <8cc8a70f174f79d10bf86b111fad9d59@ruby-forum.com> Message-ID: <20070328083704.GC5810@cordoba.webit.de> Hi! this looks good, it's just some warnings that occur when generating the rdocs. Jens On Tue, Mar 27, 2007 at 08:33:50PM +0200, Veera Sundaravel wrote: > Hello everybody! > > > I try to install ferret gem in my local machine . > > It goes like this : > > > [root at dhcppc2 ~]# gem install ferret > Need to update 1 gems from http://gems.rubyforge.org > . > complete > Select which gem to install for your platform (i686-linux) > 1. ferret 0.11.3 (ruby) > 2. ferret 0.11.2 (ruby) > 3. ferret 0.11.1 (ruby) > > >>1 > > Building native extensions. This could take a while... > ruby extconf.rb install ferret > creating Makefile > > make > make install > /usr/bin/install -c -m 0755 ferret_ext.so > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib > > make clean > Successfully installed ferret-0.11.3 > Installing ri documentation for ferret-0.11.3... > Installing RDoc documentation for ferret-0.11.3... > > > But when I try to install acts_as_ferret : > > > [root at dhcppc2 ~]# gem install acts_as_ferret > Bulk updating Gem source index for: http://gems.rubyforge.org > Successfully installed acts_as_ferret-0.4.0 > Installing ri documentation for acts_as_ferret-0.4.0... > > lib/index.rb:23:41: ':' not followed by identified or operator > Installing RDoc documentation for acts_as_ferret-0.4.0... > > lib/index.rb:23:41: ':' not followed by identified or operator > [root at dhcppc2 ~]# > > > Here I want to know whether both ferret and act_as_ferret installed > corretly or not. Please let me explain whether I have to make any > changes in my environment.rb file to enable this ferret. > > > Thanks with Regards, > Veeraa > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From senser.simon at gmail.com Wed Mar 28 05:41:52 2007 From: senser.simon at gmail.com (Jin) Date: Wed, 28 Mar 2007 11:41:52 +0200 Subject: [Ferret-talk] Error in installling gem install acts_as_ferret In-Reply-To: <8cc8a70f174f79d10bf86b111fad9d59@ruby-forum.com> References: <8cc8a70f174f79d10bf86b111fad9d59@ruby-forum.com> Message-ID: I think u can check it out from svn then directly copy it to the plugin folder Veera Sundaravel wrote: > Hello everybody! > > > I try to install ferret gem in my local machine . > > It goes like this : > > > [root at dhcppc2 ~]# gem install ferret > Need to update 1 gems from http://gems.rubyforge.org -- Posted via http://www.ruby-forum.com/. From alexkane at gmail.com Wed Mar 28 06:31:12 2007 From: alexkane at gmail.com (Alex Kane) Date: Wed, 28 Mar 2007 12:31:12 +0200 Subject: [Ferret-talk] Newbie problem on production server Message-ID: <12d3825450af0de0f6a367f938727f47@ruby-forum.com> Hi, I just installed ferret for the first time and integrated it with my app. On my dev machine it's fine but on my production server I get this when I call find_by_contents(): Processing LinksController#results (for 24.185.105.59 at 2007-03-28 05:28:36) [POST] Session ID: 3f2dc7c17147c0e52178ba697a119833 Parameters: {"commit"=>"Search", "action"=>"results", "controller"=>"links", "link"=>{"search"=>"test"}} IOError (IO Error occured at :93 in xraise Error occured in index.c:886 - sis_find_segments_file Error reading the segment infos. Store listing was ): /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:706:in `initialize' /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:706:in `new' /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:706:in `ensure_reader_open' /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:713:in `ensure_searcher_open' /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:658:in `process_query' /usr/local/lib/ruby/1.8/monitor.rb:238:in `synchronize' /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:657:in `process_query' /vendor/plugins/acts_as_ferret/lib/local_index.rb:87:in `find_id_by_contents' /vendor/plugins/acts_as_ferret/lib/class_methods.rb:82:in `find_id_by_contents' /vendor/plugins/acts_as_ferret/lib/class_methods.rb:134:in `ar_find_by_contents' /vendor/plugins/acts_as_ferret/lib/class_methods.rb:128:in `find_records_lazy_or_not' /vendor/plugins/acts_as_ferret/lib/class_methods.rb:54:in `find_by_contents' /app/controllers/links_controller.rb:11:in `results' -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Mar 28 07:45:33 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 28 Mar 2007 13:45:33 +0200 Subject: [Ferret-talk] Newbie problem on production server In-Reply-To: <12d3825450af0de0f6a367f938727f47@ruby-forum.com> References: <12d3825450af0de0f6a367f938727f47@ruby-forum.com> Message-ID: <20070328114533.GB19868@cordoba.webit.de> On Wed, Mar 28, 2007 at 12:31:12PM +0200, Alex Kane wrote: > Hi, > I just installed ferret for the first time and integrated it with my > app. On my dev machine it's fine but on my production server I get this > when I call find_by_contents(): > > Processing LinksController#results (for 24.185.105.59 at 2007-03-28 > 05:28:36) [POST] > Session ID: 3f2dc7c17147c0e52178ba697a119833 > Parameters: {"commit"=>"Search", "action"=>"results", > "controller"=>"links", "link"=>{"search"=>"test"}} > > > IOError (IO Error occured at :93 in xraise > Error occured in index.c:886 - sis_find_segments_file > Error reading the segment infos. Store listing was > > > ): Looks like the index can't be created. a typical problem is that the user your application is running as does not have the privileges to create the index directory (RAILS_ROOT/index/production/model_name) and/or write to this directory. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From alexkane at gmail.com Wed Mar 28 11:16:23 2007 From: alexkane at gmail.com (Alex Kane) Date: Wed, 28 Mar 2007 17:16:23 +0200 Subject: [Ferret-talk] Newbie problem on production server In-Reply-To: <20070328114533.GB19868@cordoba.webit.de> References: <12d3825450af0de0f6a367f938727f47@ruby-forum.com> <20070328114533.GB19868@cordoba.webit.de> Message-ID: <6c7a97c1c379e8b6b79a623e31abf752@ruby-forum.com> I checked the permissions and I'm able to write to this path, so it must be something else. ~ Alex Jens Kraemer wrote: > On Wed, Mar 28, 2007 at 12:31:12PM +0200, Alex Kane wrote: >> >> >> IOError (IO Error occured at :93 in xraise >> Error occured in index.c:886 - sis_find_segments_file >> Error reading the segment infos. Store listing was >> >> >> ): > > Looks like the index can't be created. > > a typical problem is that the user your application is running as does > not have the privileges to create the index directory > (RAILS_ROOT/index/production/model_name) and/or write to this directory. > > Jens > > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen Malessa -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Mar 28 11:33:00 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 28 Mar 2007 17:33:00 +0200 Subject: [Ferret-talk] Newbie problem on production server In-Reply-To: <6c7a97c1c379e8b6b79a623e31abf752@ruby-forum.com> References: <12d3825450af0de0f6a367f938727f47@ruby-forum.com> <20070328114533.GB19868@cordoba.webit.de> <6c7a97c1c379e8b6b79a623e31abf752@ruby-forum.com> Message-ID: <20070328153300.GH19868@cordoba.webit.de> On Wed, Mar 28, 2007 at 05:16:23PM +0200, Alex Kane wrote: > I checked the permissions and I'm able to write to this path, so it must > be something else. what happens if you run Model.rebuild_index on the console? Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From alex at pressure.to Wed Mar 28 14:30:36 2007 From: alex at pressure.to (Alex Fenton) Date: Wed, 28 Mar 2007 19:30:36 +0100 Subject: [Ferret-talk] retrieving search result positions Message-ID: <460AB44C.4090002@pressure.to> Hi I'm considering using Ferret in v2 of Weft QDA, a wxruby desktop application for textual analysis in social science. Ferret seems a very impressive package that meets and exceeds my requirements, but I can't find how to retrieve specific details about the results. I'd like to be able to run fairly simple queries. I then need to look at each term match, and get its document id and the character (not byte) position at which it occurs in the source document. My semi-illiterate reading of search.c suggests this is available, but looking at the SearchHits returned by a SpanTermQuery, they don't seem to contain the methods I'm looking for. Thanks for any help. alex From manoel at lemos.net Wed Mar 28 16:55:07 2007 From: manoel at lemos.net (Manoel Lemos) Date: Wed, 28 Mar 2007 22:55:07 +0200 Subject: [Ferret-talk] Questions on tokenized x untokenized and date sorting Message-ID: Gents, does this definition will allow me to search inside title, sub_title and url and sort by score, rank_sort, last_updated_at_sort ? acts_as_ferret :fields => { :title => {:boost => 2, :store => :yes}, :sub_title => {:store => :yes}, :url => {:store => :yes}, :rank_sort => {:index => :untokenized}, :last_updated_at_sort => {:index => :untokenized_omit_norms, :term_vector => :no}}, :remote => true def rank_sort begin return self.rank_links.to_i rescue return nil end end def last_updated_at_sort begin self.last_updated_at.to_i rescue return nil end end -- Posted via http://www.ruby-forum.com/. From ryansking at gmail.com Wed Mar 28 17:55:17 2007 From: ryansking at gmail.com (Ryan King) Date: Wed, 28 Mar 2007 14:55:17 -0700 Subject: [Ferret-talk] trouble with PerFieldAnalyzer Message-ID: <846f30c70703281455i4231ef18jba0dca56306d6bcd@mail.gmail.com> I'm having trouble with PerFieldAnalyzer (ferret version 0.10.14). Script: require 'rubygems' require 'ferret' require 'pp' include Ferret::Analysis include Ferret::Index class TestAnalyzer def token_stream field, input pp field pp input LetterTokenizer.new(input) end end pfa = PerFieldAnalyzer.new(StandardAnalyzer.new()) pfa[:test] = TestAnalyzer.new index = Index.new(:analyzer => pfa) index << {:test => 'foo'} index.search_each('bar') Output: :test "" :test "bar" Why is input "" the first time token_stream is called? I hope that the answer isn't "upgrade to 0.11". :( -ryan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070328/6172137d/attachment.html From senser.simon at gmail.com Wed Mar 28 21:51:30 2007 From: senser.simon at gmail.com (Jin) Date: Thu, 29 Mar 2007 03:51:30 +0200 Subject: [Ferret-talk] Newbie problem on production server In-Reply-To: <20070328153300.GH19868@cordoba.webit.de> References: <12d3825450af0de0f6a367f938727f47@ruby-forum.com> <20070328114533.GB19868@cordoba.webit.de> <6c7a97c1c379e8b6b79a623e31abf752@ruby-forum.com> <20070328153300.GH19868@cordoba.webit.de> Message-ID: <788cdcf596143624c14d8a41af62d4c7@ruby-forum.com> I also can't run model.rebuild_index in the list action report io exception Jens Kraemer wrote: > On Wed, Mar 28, 2007 at 05:16:23PM +0200, Alex Kane wrote: >> I checked the permissions and I'm able to write to this path, so it must >> be something else. > > what happens if you run Model.rebuild_index on the console? > > Jens -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Mar 29 04:11:41 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 29 Mar 2007 10:11:41 +0200 Subject: [Ferret-talk] retrieving search result positions In-Reply-To: <460AB44C.4090002@pressure.to> References: <460AB44C.4090002@pressure.to> Message-ID: <20070329081141.GJ19868@cordoba.webit.de> On Wed, Mar 28, 2007 at 07:30:36PM +0100, Alex Fenton wrote: > Hi > > I'm considering using Ferret in v2 of Weft QDA, a wxruby desktop > application for textual analysis in social science. > > Ferret seems a very impressive package that meets and exceeds my > requirements, but I can't find how to retrieve specific details about > the results. > > I'd like to be able to run fairly simple queries. I then need to look at > each term match, and get its document id and the character (not byte) > position at which it occurs in the source document. > > My semi-illiterate reading of search.c suggests this is available, but > looking at the SearchHits returned by a SpanTermQuery, they don't seem > to contain the methods I'm looking for. Without fully understanding what you want to achieve, I guess TermVectors are what you're looking for. I'm not sure if they're working on characters or bytes, though. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Thu Mar 29 04:15:50 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 29 Mar 2007 10:15:50 +0200 Subject: [Ferret-talk] Questions on tokenized x untokenized and date sorting In-Reply-To: References: Message-ID: <20070329081550.GK19868@cordoba.webit.de> On Wed, Mar 28, 2007 at 10:55:07PM +0200, Manoel Lemos wrote: > Gents, does this definition will allow me to search inside title, > sub_title and url and sort by score, rank_sort, last_updated_at_sort ? looks good. Be sure to specify :int as the sort type when you sort by rank_sort or last_updated_at_sort, or left-pad the values to a fixed length with '0'. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From zond at troja.ath.cx Thu Mar 29 04:40:36 2007 From: zond at troja.ath.cx (Martin Kihlgren) Date: Thu, 29 Mar 2007 10:40:36 +0200 Subject: [Ferret-talk] memory leaks in exception handling? Message-ID: <20070329084036.GA16111@troja.ath.cx> I just posted a ticket regarding possible memory leaks in the c layer exception handling: http://ferret.davebalmain.com/trac/ticket/187 (This mail is just to draw attention to it :) regards, //Martin -- ################################################################### Things are more like they used to be than they are now. ################################################################### From peter.schrammel at web.de Thu Mar 29 07:07:00 2007 From: peter.schrammel at web.de (P. Schrammel) Date: Thu, 29 Mar 2007 13:07:00 +0200 Subject: [Ferret-talk] higlighting problem In-Reply-To: <20070309093821.GA20174@cordoba.webit.de> References: <3ba6234d1c29e0b97fbebe00d9a4d75a@ruby-forum.com> <20070309093821.GA20174@cordoba.webit.de> Message-ID: Jens Kraemer wrote: ... > results = Link.find_by_contents(query) > result = results.first > result.highlight(query, :field => :description) # returns nil > > doc_num = result.document_number > > # if you are on aaf trunk: > Link.aaf_index.ferret_index.highlight(query, doc_num, :field => > :description) > # if on aaf stable: > Link.ferret_index.highlight(query, doc_num, :field => :description) > > > this would directly use ferret's highlight method. Btw, what version of > aaf do you use? > ... Hi all, same Problem here...using gem version 0.11.3. I tried the above but still 'nil'. How can I find out if :store is set to :yes? Regards Peter -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Mar 29 07:52:28 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 29 Mar 2007 13:52:28 +0200 Subject: [Ferret-talk] higlighting problem In-Reply-To: References: <3ba6234d1c29e0b97fbebe00d9a4d75a@ruby-forum.com> <20070309093821.GA20174@cordoba.webit.de> Message-ID: <20070329115228.GC26296@cordoba.webit.de> On Thu, Mar 29, 2007 at 01:07:00PM +0200, P. Schrammel wrote: > Jens Kraemer wrote: > ... > > results = Link.find_by_contents(query) > > result = results.first > > result.highlight(query, :field => :description) # returns nil > > > > doc_num = result.document_number > > > > # if you are on aaf trunk: > > Link.aaf_index.ferret_index.highlight(query, doc_num, :field => > > :description) > > # if on aaf stable: > > Link.ferret_index.highlight(query, doc_num, :field => :description) > > > > > > this would directly use ferret's highlight method. Btw, what version of > > aaf do you use? > > > ... > Hi all, > same Problem here...using gem version 0.11.3. I tried the above but > still 'nil'. > How can I find out if :store is set to :yes? retrieve a ferret document from the index and try to access the field in question: doc = index[doc_num] puts doc[:field] if it gives the contents of the field, the field is :store => :yes, if nil, it isn't. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From peter.schrammel at web.de Thu Mar 29 08:59:04 2007 From: peter.schrammel at web.de (P. Schrammel) Date: Thu, 29 Mar 2007 14:59:04 +0200 Subject: [Ferret-talk] higlighting problem In-Reply-To: <20070329115228.GC26296@cordoba.webit.de> References: <3ba6234d1c29e0b97fbebe00d9a4d75a@ruby-forum.com> <20070309093821.GA20174@cordoba.webit.de> <20070329115228.GC26296@cordoba.webit.de> Message-ID: Now it works...don't ask why. I'll try to explain what I did, perhaps you'll find the error: Created a rails model: class Article < ActiveRecord::Base acts_as_ferret :fields => [:normalized_text], :analyzer => HTMLAnalyzer.new end -filled in the data. Found docs that I should use :store -called Article.delete_all on the rails console -rewrote article.rb : class Article < ActiveRecord::Base acts_as_ferret :fields => { :normalized_text => {:store => :yes }}, :analyzer => HTMLAnalyzer.new end -fill in the data (highlight didn't work) -did Article.delete_all again -rm RAILS_ROOT/index/article/* -fill in the data -highlight works Thanks Peter -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Mar 29 09:20:45 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 29 Mar 2007 15:20:45 +0200 Subject: [Ferret-talk] higlighting problem In-Reply-To: References: <3ba6234d1c29e0b97fbebe00d9a4d75a@ruby-forum.com> <20070309093821.GA20174@cordoba.webit.de> <20070329115228.GC26296@cordoba.webit.de> Message-ID: <20070329132045.GF26296@cordoba.webit.de> On Thu, Mar 29, 2007 at 02:59:04PM +0200, P. Schrammel wrote: > Now it works...don't ask why. I'll try to explain what I did, perhaps > you'll find the error: yeah, delete_all is evil because it only does the sql delete without calling the Rails callback methods for the deleted records - so there's no way for aaf to remove the old entries from the index. destroy_all would have worked. By deleting the index directory you forced aaf to rebuild the index, which then used the new aaf-options. Long story short: Calling Article.rebuild_index after changing the aaf options would have worked, too :-) Jens > > Created a rails model: > > class Article < ActiveRecord::Base > acts_as_ferret :fields => [:normalized_text], > :analyzer => HTMLAnalyzer.new > end > > -filled in the data. Found docs that I should use :store > -called Article.delete_all on the rails console > -rewrote article.rb : > > class Article < ActiveRecord::Base > acts_as_ferret :fields => { :normalized_text => {:store => :yes }}, > :analyzer => HTMLAnalyzer.new > end > > -fill in the data (highlight didn't work) > -did Article.delete_all again > -rm RAILS_ROOT/index/article/* > -fill in the data > -highlight works > > Thanks > Peter > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From levent at leventali.com Thu Mar 29 10:38:58 2007 From: levent at leventali.com (Levent) Date: Thu, 29 Mar 2007 16:38:58 +0200 Subject: [Ferret-talk] ferret on 64bit systems? In-Reply-To: References: <45FA6193.7040102@inforadical.net> Message-ID: David Balmain wrote: > Hi Caleb, > > I'll try and work on this warnings and put out a new release. Give me > a day or two. It is quite probable that these warnings are related to > the issues you are having. Thanks for posting them. > > Cheers, > Dave Hi, We're having similar issues and were wondering when a fix was likely? Cheers for all the hard work. Any news on an updated windows version too? regards levent -- Posted via http://www.ruby-forum.com/. From jeff.green at jgp.co.uk Thu Mar 29 10:44:37 2007 From: jeff.green at jgp.co.uk (Jeff Green) Date: Thu, 29 Mar 2007 15:44:37 +0100 Subject: [Ferret-talk] Nasty looking warnings on Debian Etch AMD64 bit box Message-ID: <38da7ce7aa27ccb6ccbd68625cf1d8f7460bd23d@jobsgopublic.com> Running gem install ferret and selecting 0.11.3 on a Dual Xeon or Dual Opteron 64 bit box running Debian Etch gives the following list of nasty looking warnings, anyone running successfully on 64 bit linux? Building native extensions. This could take a while... fs_store.c: In function ?fso_seek_i?: fs_store.c:238: warning: format ?%lld? expects type ?long long int?, but argument 4 has type ?off_t? fs_store.c:238: warning: format ?%lld? expects type ?long long int?, but argument 4 has type ?off_t? fs_store.c: In function ?fsi_seek_i?: fs_store.c:292: warning: format ?%lld? expects type ?long long int?, but argument 4 has type ?off_t? fs_store.c:292: warning: format ?%lld? expects type ?long long int?, but argument 4 has type ?off_t? ferret.c: In function ?object_add2?: ferret.c:69: warning: cast from pointer to integer of different size ferret.c:69: warning: cast from pointer to integer of different size ferret.c: In function ?object_del2?: ferret.c:88: warning: cast from pointer to integer of different size r_search.c: In function ?frt_td_to_s?: r_search.c:202: warning: format ?%d? expects type ?int?, but argument 3 has type ?long int? compound_io.c: In function ?cmpdi_read_i?: compound_io.c:135: warning: format ?%lld? expects type ?long long int?, but argument 4 has type ?off_t? compound_io.c:135: warning: format ?%lld? expects type ?long long int?, but argument 5 has type ?off_t? compound_io.c:135: warning: format ?%lld? expects type ?long long int?, but argument 4 has type ?off_t? compound_io.c:135: warning: format ?%lld? expects type ?long long int?, but argument 5 has type ?off_t? compound_io.c: In function ?cw_copy_file?: compound_io.c:324: warning: format ?%lld? expects type ?long long int?, but argument 4 has type ?off_t? compound_io.c:324: warning: format ?%lld? expects type ?long long int?, but argument 4 has type ?off_t? compound_io.c:333: warning: format ?%lld? expects type ?long long int?, but argument 4 has type ?off_t? compound_io.c:333: warning: format ?%lld? expects type ?long long int?, but argument 5 has type ?off_t? compound_io.c:333: warning: format ?%lld? expects type ?long long int?, but argument 4 has type ?off_t? compound_io.c:333: warning: format ?%lld? expects type ?long long int?, but argument 5 has type ?off_t? store.c: In function ?is_refill?: store.c:215: warning: format ?%lld? expects type ?long long int?, but argument 4 has type ?off_t? store.c:215: warning: format ?%lld? expects type ?long long int?, but argument 5 has type ?off_t? store.c:215: warning: format ?%lld? expects type ?long long int?, but argument 4 has type ?off_t? store.c:215: warning: format ?%lld? expects type ?long long int?, but argument 5 has type ?off_t? ruby extconf.rb install ferret creating Makefile Jeff Green Direct: 020 7923 5652 Switchboard: 020 7923 5610 Mobile: 0778 951 2033 Jobs Go Public is a limited company registered in England and Wales. Registration Number 3716200. V.A.T Number 777 9458 52. Registered office; 12-16 Laystall Street, London, EC1R 4PF Consider the environment; please don't print this email unless you really need to. From bk at benjaminkrause.com Thu Mar 29 13:23:21 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Thu, 29 Mar 2007 19:23:21 +0200 Subject: [Ferret-talk] ferret on 64bit systems? In-Reply-To: References: <45FA6193.7040102@inforadical.net> Message-ID: > We're having similar issues and were wondering when a fix was likely? > Cheers for all the hard work. we're running ferret on two Dual-Core AMD Opteron(tm) Processor 2216 HE for quite some time now without any crash (Linux 2.6.19 x86_64 - no 32bit emulation). Do you have more information about these crashes? Benjamin From alex at pressure.to Thu Mar 29 14:28:36 2007 From: alex at pressure.to (Alex Fenton) Date: Thu, 29 Mar 2007 19:28:36 +0100 Subject: [Ferret-talk] retrieving search result positions In-Reply-To: <20070329081141.GJ19868@cordoba.webit.de> References: <460AB44C.4090002@pressure.to> <20070329081141.GJ19868@cordoba.webit.de> Message-ID: <460C0554.8080105@pressure.to> Jens Kraemer wrote: > On Wed, Mar 28, 2007 at 07:30:36PM +0100, Alex Fenton wrote: > >> I'd like to be able to run fairly simple queries. I then need to look at >> each term match, and get its document id and the character (not byte) >> position at which it occurs in the source document. >> > Without fully understanding what you want to achieve, I guess > TermVectors are what you're looking for. Thank you - that class has exactly the data I need. Is there any way to extract the individual TermVectors implied by a set of search results? #highlight seems to do this internally, but the only ruby way I've found to access TVs is via index.reader.term_vector(docid_id, :field). I'd like to be able to find the terms in results of eg a fuzzy or phrase search. > I'm not sure if they're working > on characters or bytes, though. > Looks like bytes, but i can probably work round that. thanks alex From john at digitalpulp.com Thu Mar 29 18:42:16 2007 From: john at digitalpulp.com (John Bachir) Date: Thu, 29 Mar 2007 18:42:16 -0400 Subject: [Ferret-talk] ferret/lucene syntax Message-ID: <162AB212-F20A-47A0-9D99-E38484148202@digitalpulp.com> I jut noticed this example in the lucene documentation*: title:(+return +"pink panther") I have been using this syntax: +title:(return AND "pink panther") Seemingly with success. Are both acceptable? I couldn't find any documentation on "the plus sign" itself. Thanks for any pointers. John *http://lucene.apache.org/java/docs/queryparsersyntax.html#Boolean% 20operators -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070329/28158637/attachment.html From john at digitalpulp.com Thu Mar 29 18:34:15 2007 From: john at digitalpulp.com (John Bachir) Date: Thu, 29 Mar 2007 18:34:15 -0400 Subject: [Ferret-talk] nil's representation in the index? Message-ID: <33664F14-0FAE-4754-92F6-6711415B234D@digitalpulp.com> How are ruby nil values represented in the index? Thanks, John From kraemer at webit.de Fri Mar 30 03:32:23 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 30 Mar 2007 09:32:23 +0200 Subject: [Ferret-talk] ferret/lucene syntax In-Reply-To: <162AB212-F20A-47A0-9D99-E38484148202@digitalpulp.com> References: <162AB212-F20A-47A0-9D99-E38484148202@digitalpulp.com> Message-ID: <20070330073223.GA580@cordoba.webit.de> On Thu, Mar 29, 2007 at 06:42:16PM -0400, John Bachir wrote: > I jut noticed this example in the lucene documentation*: > > title:(+return +"pink panther") > > I have been using this syntax: > > +title:(return AND "pink panther") > > Seemingly with success. Are both acceptable? I couldn't find any > documentation on "the plus sign" itself. the plus sign marks a required clause in a query. A document can only be a hit if it matches that clause. The opposite of this is the minus sign, documents that match such a clause can't be a hit. Internally, Ferret doesn't handle AND and such, they get translated by the query parser, i.e. 'a AND b' --> '+a +b' Clauses without + or - are optional 'nice to have' clauses, they will raise a document's score if they match, but the doc won't be excluded from the hits if they don't. So 'a OR b' gets transformed into 'a b'. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Fri Mar 30 03:33:56 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 30 Mar 2007 09:33:56 +0200 Subject: [Ferret-talk] nil's representation in the index? In-Reply-To: <33664F14-0FAE-4754-92F6-6711415B234D@digitalpulp.com> References: <33664F14-0FAE-4754-92F6-6711415B234D@digitalpulp.com> Message-ID: <20070330073356.GB580@cordoba.webit.de> On Thu, Mar 29, 2007 at 06:34:15PM -0400, John Bachir wrote: > How are ruby nil values represented in the index? not at all, I guess. Ferret works on strings, and it makes no sense to store an empty string in the index. cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Fri Mar 30 03:39:12 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 30 Mar 2007 09:39:12 +0200 Subject: [Ferret-talk] retrieving search result positions In-Reply-To: <460C0554.8080105@pressure.to> References: <460AB44C.4090002@pressure.to> <20070329081141.GJ19868@cordoba.webit.de> <460C0554.8080105@pressure.to> Message-ID: <20070330073912.GC580@cordoba.webit.de> On Thu, Mar 29, 2007 at 07:28:36PM +0100, Alex Fenton wrote: > Jens Kraemer wrote: > > On Wed, Mar 28, 2007 at 07:30:36PM +0100, Alex Fenton wrote: > > > >> I'd like to be able to run fairly simple queries. I then need to look at > >> each term match, and get its document id and the character (not byte) > >> position at which it occurs in the source document. > >> > > Without fully understanding what you want to achieve, I guess > > TermVectors are what you're looking for. > Thank you - that class has exactly the data I need. Is there any way to > extract the individual TermVectors implied by a set of search results? > > #highlight seems to do this internally, but the only ruby way I've found > to access TVs is via index.reader.term_vector(docid_id, :field). I'd > like to be able to find the terms in results of eg a fuzzy or phrase search. you will get the doc_ids back from your search, so wouldn't it work to just do a search_each and retrieve the term vectors inside the block? index.search_each(query) do |doc_id, score| tv = index.reader.term_vector(doc_id, :field) end Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From alex at pressure.to Fri Mar 30 04:03:10 2007 From: alex at pressure.to (Alex Fenton) Date: Fri, 30 Mar 2007 09:03:10 +0100 Subject: [Ferret-talk] retrieving search result positions In-Reply-To: <20070330073912.GC580@cordoba.webit.de> References: <460AB44C.4090002@pressure.to> <20070329081141.GJ19868@cordoba.webit.de> <460C0554.8080105@pressure.to> <20070330073912.GC580@cordoba.webit.de> Message-ID: <460CC43E.80300@pressure.to> Jens Kraemer wrote: >> #highlight seems to do this internally, but the only ruby way I've found >> to access TVs is via index.reader.term_vector(docid_id, :field). I'd >> like to be able to find the terms in results of eg a fuzzy or phrase search. >> > > you will get the doc_ids back from your search, so wouldn't it work to > just do a search_each and retrieve the term vectors inside the block? > > index.search_each(query) do |doc_id, score| > tv = index.reader.term_vector(doc_id, :field) > end > I'll give it a try, but if it was a fuzzy match I'm not sure I would know the exact term that was matched. Similarly with a phrase match - think I would have to manually verify that a particular occurrence of one term met the phrase criteria. thanks alex From erik at ehatchersolutions.com Fri Mar 30 17:58:09 2007 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Fri, 30 Mar 2007 17:58:09 -0400 Subject: [Ferret-talk] Fwd: New Short Cut: Ferret References: Message-ID: Congrats David!!! Very cool. Erik Begin forwarded message: > From: "O'Reilly Media" > Date: March 30, 2007 6:22:36 PM EDT > To: erik at ehatchersolutions.com > Subject: New Short Cut: Ferret > > ***New from the O'Reilly Store*** > > Ferret > http://www.oreilly.com/catalog/9780596527853 > > By David Balmain > First Edition: March 2007 > ISBN: 0-596-52785-3 > Pages: 91 > PDF Price: $9.99 USD > > With the introduction of Ferret, Ruby users now have one > of the fastest and most flexible search libraries available. > And it's surprisingly easy to use. > > > This Short Cut will show you how to quickly get up and > running with Ferret. You'll learn how to index different > document types such as PDF, Microsoft Word, and HTML, as > well as how to deal with foreign languages and different > character encodings. This document describes the Ferret > Query Language in detail along with the object-oriented > approach to building queries. > > > You will also be introduced to sorting, filtering, and > highlighting your search results, with an explanation of > exactly how you need to set up your index to perform these > tasks. You will also learn how to optimize a Ferret index > for lightning fast indexing and split-second query results. > > =============================================================== > BUY 2 BOOKS, GET 1 FREE > Take advantage of our "Buy 2 Books, Get 1 Free" offer by > cutting and pasting code "OPC10" into the shopping cart. > Any orders over $29.95 also qualify for free shipping in the US. > =============================================================== > > Want to receive future O'Reilly emails in HTML, > or change your O'Reilly Newsletter settings? Please visit: > http://www.oreillynet.com/cs/nl/home > > For assistance, email help at oreillynet.com > > O'Reilly Media, Inc. > 1005 Gravenstein Highway North > Sebastopol, CA 95472 > (707) 827-7000 From hsandjaja at gmail.com Fri Mar 30 21:16:04 2007 From: hsandjaja at gmail.com (Harman Sandjaja) Date: Sat, 31 Mar 2007 03:16:04 +0200 Subject: [Ferret-talk] Problem with setting up remote indexing Message-ID: <54d4de205945b182657d5ad14682ad65@ruby-forum.com> Hello, I have been trying to set up the remote indexing for acts_as_ferret and followed the guide here: http://projects.jkraemer.net/acts_as_ferret/wiki/DrbServer I added :remote => true to my models and then specified host and port for the production environment. After defining the host/port for production, I tried to run my development server and received the following error: vendor/plugins/acts_as_ferret/lib/act_methods.rb:66:in `acts_as_ferret': You have a nil object when you didn't expect it! (NoMethodError) You might have expected an instance of Array. Isn't it supposed to work just fine even though we don't specify neither the development nor the test environment in the ferret_server.yml? Anyways, I then decided to specify development environment in the ferret_server.yml Here is how it's setup: development: host: localhost port: 3000 (my rails app port) Sure enough I don't get that error anymore and my development server boots up just fine. However, when I executed my search it didn't work (I use id_multi_search btw so need to add :store_class_name => true to my models as well). The following is the error description: DRb::DRbConnError (too large packet 1213486160): C:/ruby/lib/ruby/1.8/drb/drb.rb:573:in `load' C:/ruby/lib/ruby/1.8/drb/drb.rb:632:in `recv_reply' C:/ruby/lib/ruby/1.8/drb/drb.rb:921:in `recv_reply' C:/ruby/lib/ruby/1.8/drb/drb.rb:1195:in `send_message' C:/ruby/lib/ruby/1.8/drb/drb.rb:1086:in `method_missing' C:/ruby/lib/ruby/1.8/drb/drb.rb:1170:in `open' C:/ruby/lib/ruby/1.8/drb/drb.rb:1085:in `method_missing' C:/ruby/lib/ruby/1.8/drb/drb.rb:1103:in `with_friend' C:/ruby/lib/ruby/1.8/drb/drb.rb:1084:in `method_missing' C:/work/myapp/vendor/plugins/acts_as_ferret/lib/remote_index.rb:25:in `id_multi_search' C:/work/myapp/vendor/plugins/acts_as_ferret/lib/class_methods.rb:117:in `id_multi_search' C:/work/myapp/lib/ferret_search.rb:73:in `quick_search' ... Any clue guys? Thank you in advance for the help. -- Posted via http://www.ruby-forum.com/. From john at digitalpulp.com Sat Mar 31 01:18:47 2007 From: john at digitalpulp.com (John Joseph Bachir) Date: Sat, 31 Mar 2007 01:18:47 -0400 Subject: [Ferret-talk] ferret/lucene syntax In-Reply-To: <20070330073223.GA580@cordoba.webit.de> References: <162AB212-F20A-47A0-9D99-E38484148202@digitalpulp.com> <20070330073223.GA580@cordoba.webit.de> Message-ID: <8D1B28AD-6D5C-4F44-AA39-AB5F8AB30742@digitalpulp.com> On Mar 30, 2007, at 3:32 AM, Jens Kraemer wrote: > the plus sign marks a required clause in a query. A document can > only be > a hit if it matches that clause. The opposite of this is the minus > sign, > documents that match such a clause can't be a hit. Internally, Ferret > doesn't handle AND and such, they get translated by the query parser, > i.e. 'a AND b' --> '+a +b' > > Clauses without + or - are optional 'nice to have' clauses, they will > raise a document's score if they match, but the doc won't be excluded > from the hits if they don't. So 'a OR b' gets transformed into 'a b'. Thanks for that, I actually was completely unaware of the case without + or -. Very nice. However, my question was actually more simple: are the semantics of these two bit of a query the same? title:(+return +"pink panther") +title:(return AND "pink panther") Thanks, John From john at digitalpulp.com Sat Mar 31 01:20:50 2007 From: john at digitalpulp.com (John Joseph Bachir) Date: Sat, 31 Mar 2007 01:20:50 -0400 Subject: [Ferret-talk] nil's representation in the index? In-Reply-To: <20070330073356.GB580@cordoba.webit.de> References: <33664F14-0FAE-4754-92F6-6711415B234D@digitalpulp.com> <20070330073356.GB580@cordoba.webit.de> Message-ID: <63788E1E-C3EA-4903-ADFB-57F76A1FDBDE@digitalpulp.com> On Mar 30, 2007, at 3:33 AM, Jens Kraemer wrote: > On Thu, Mar 29, 2007 at 06:34:15PM -0400, John Bachir wrote: >> How are ruby nil values represented in the index? > > not at all, I guess. Ferret works on strings, and it makes no sense to > store an empty string in the index. I am trying to query for objects that match certain terms in field A, and field B = nil in the model. (The column is untokenized) John From john at digitalpulp.com Sat Mar 31 01:31:39 2007 From: john at digitalpulp.com (John Joseph Bachir) Date: Sat, 31 Mar 2007 01:31:39 -0400 Subject: [Ferret-talk] DRb server & aaf gem Message-ID: I'm having problems getting the DRb server running with the aaf gem. I tried it with the plugin installed in my application, and it worked. I suspect the problem has something to do with the startup scripts expecting certain files to be in certain relative file paths. Any insights are appreciated, and maybe if you have time you can update the wiki document :) Thanks for a great tool. John From erik at ehatchersolutions.com Sat Mar 31 01:40:05 2007 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Sat, 31 Mar 2007 01:40:05 -0400 Subject: [Ferret-talk] ferret/lucene syntax In-Reply-To: <8D1B28AD-6D5C-4F44-AA39-AB5F8AB30742@digitalpulp.com> References: <162AB212-F20A-47A0-9D99-E38484148202@digitalpulp.com> <20070330073223.GA580@cordoba.webit.de> <8D1B28AD-6D5C-4F44-AA39-AB5F8AB30742@digitalpulp.com> Message-ID: On Mar 31, 2007, at 1:18 AM, John Joseph Bachir wrote: > > On Mar 30, 2007, at 3:32 AM, Jens Kraemer wrote: >> the plus sign marks a required clause in a query. A document can >> only be >> a hit if it matches that clause. The opposite of this is the minus >> sign, >> documents that match such a clause can't be a hit. Internally, Ferret >> doesn't handle AND and such, they get translated by the query parser, >> i.e. 'a AND b' --> '+a +b' >> >> Clauses without + or - are optional 'nice to have' clauses, they will >> raise a document's score if they match, but the doc won't be excluded >> from the hits if they don't. So 'a OR b' gets transformed into 'a b'. > > > Thanks for that, I actually was completely unaware of the case > without + or -. Very nice. > > However, my question was actually more simple: are the semantics of > these two bit of a query the same? > > > title:(+return +"pink panther") > > +title:(return AND "pink panther") I replied to this the other day, but I think my sending address was incorrect and it didn't go through - sorry 'bout that. My reply was this: ----- Yup, both are acceptable. An "AND" actually affects both sides and forces them to be required, as if you had used + in front. + is documented in the document you referenced, just below AND. If you are combining the queries above with other clauses they may not be equivalent due to the +title, but if these are the full queries they are equivalent (at least in Java Lucene). Erik From caleb at inforadical.net Sat Mar 31 01:43:54 2007 From: caleb at inforadical.net (Caleb Clausen) Date: Fri, 30 Mar 2007 22:43:54 -0700 Subject: [Ferret-talk] ferret on 64bit systems? In-Reply-To: References: Message-ID: <460DF51A.7090903@inforadical.net> Regarding the recent questions on using ferret on 64-bit systems: I've found that for the most part, recent ferrets work if you know how to trick them into relative reliability. All the problems that I've had so far I've been able to work around by segregating the code that seems to crash in separate process(es). If you can avoid doing too much in one process, the crashes disappear. This is not a stable solution; I look forward as well to bug fixes by Dave. I should add that at this point, most of my testing is on my 32bit development system; tho I have seen many crashes there too. :( From kraemer at webit.de Sat Mar 31 04:44:45 2007 From: kraemer at webit.de (Jens Kraemer) Date: Sat, 31 Mar 2007 10:44:45 +0200 Subject: [Ferret-talk] Problem with setting up remote indexing In-Reply-To: <54d4de205945b182657d5ad14682ad65@ruby-forum.com> References: <54d4de205945b182657d5ad14682ad65@ruby-forum.com> Message-ID: <20070331084445.GA10093@cordoba.webit.de> On Sat, Mar 31, 2007 at 03:16:04AM +0200, Harman Sandjaja wrote: > Hello, > > I have been trying to set up the remote indexing for acts_as_ferret and > followed the guide here: > http://projects.jkraemer.net/acts_as_ferret/wiki/DrbServer > I added :remote => true to my models and then specified host and port > for the production environment. > > After defining the host/port for production, I tried to run my > development server and received the following error: > vendor/plugins/acts_as_ferret/lib/act_methods.rb:66:in `acts_as_ferret': > You have a nil object when you didn't expect it! (NoMethodError) > You might have expected an instance of Array. > > Isn't it supposed to work just fine even though we don't specify neither > the development nor the test environment in the ferret_server.yml? > > Anyways, I then decided to specify development environment in the > ferret_server.yml > Here is how it's setup: > development: > host: localhost > port: 3000 (my rails app port) I guess that's the problem, this port is *not* the rails app port, but the port the DRb server will listen on. it needs to be a free, unused port on the machine where the DRb server should run. Also be sure to start the drb server with script/ferret_start . Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Sat Mar 31 05:25:02 2007 From: kraemer at webit.de (Jens Kraemer) Date: Sat, 31 Mar 2007 11:25:02 +0200 Subject: [Ferret-talk] DRb server & aaf gem In-Reply-To: References: Message-ID: <20070331092502.GB10093@cordoba.webit.de> On Sat, Mar 31, 2007 at 01:31:39AM -0400, John Joseph Bachir wrote: > I'm having problems getting the DRb server running with the aaf gem. > I tried it with the plugin installed in my application, and it > worked. I suspect the problem has something to do with the startup > scripts expecting certain files to be in certain relative file paths. I guess you did create ferret_server.yml in RAILS_ROOT/config? What's the exact error you get? > Any insights are appreciated, and maybe if you have time you can > update the wiki document :) I just gave it a try ;-) However there's no gem-specific info yet, I'll try to throw a script together that eases setting up a project to use the aaf gem. cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Sat Mar 31 05:30:19 2007 From: kraemer at webit.de (Jens Kraemer) Date: Sat, 31 Mar 2007 11:30:19 +0200 Subject: [Ferret-talk] nil's representation in the index? In-Reply-To: <63788E1E-C3EA-4903-ADFB-57F76A1FDBDE@digitalpulp.com> References: <33664F14-0FAE-4754-92F6-6711415B234D@digitalpulp.com> <20070330073356.GB580@cordoba.webit.de> <63788E1E-C3EA-4903-ADFB-57F76A1FDBDE@digitalpulp.com> Message-ID: <20070331093019.GC10093@cordoba.webit.de> On Sat, Mar 31, 2007 at 01:20:50AM -0400, John Joseph Bachir wrote: > > On Mar 30, 2007, at 3:33 AM, Jens Kraemer wrote: > > > On Thu, Mar 29, 2007 at 06:34:15PM -0400, John Bachir wrote: > >> How are ruby nil values represented in the index? > > > > not at all, I guess. Ferret works on strings, and it makes no sense to > > store an empty string in the index. > > > I am trying to query for objects that match certain terms in field A, > and field B = nil in the model. (The column is untokenized) I think the only way to achieve this is to epresent the nil with some non-nil value in the index ('NIL' or something like this) so you can search for it. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From manoel at lemos.net Sat Mar 31 09:45:40 2007 From: manoel at lemos.net (Manoel Lemos) Date: Sat, 31 Mar 2007 15:45:40 +0200 Subject: [Ferret-talk] Sorting issues, can anyone help me? Message-ID: <00996aebb696dd986364afd412bb79ac@ruby-forum.com> I have this model: class Post < ActiveRecord::Base acts_as_ferret :fields => { :title => {:boost => 2}, :description => {}, :url => {}, :rank_sort => {:index => :untokenized_omit_norms, :term_vector => :no}, :posted_at_sort => {:index => :untokenized_omit_norms, :term_vector => :no} }, :remote => true belongs_to :blog def rank_sort begin return self.blog.rank_links.to_i rescue return nil end end def posted_at_sort begin return self.posted_at.to_i rescue return nil end end end But when I try to sort by :rank_sort or :posted_at_sort it didn't work, see: >> h,r = Post.full_text_search("voltamos", {:sort => Ferret::Search::SortField.new(:rank_sort, :reverse => false)} )[1].first.posted_at_sort => [1146857920] >> h,r = Post.full_text_search("voltamos", {:sort => Ferret::Search::SortField.new(:rank_sort, :reverse => false)} )[1].last.posted_at_sort => [1085549830] >> h,r = Post.full_text_search("voltamos", {:sort => Ferret::Search::SortField.new(:rank_sort, :reverse => true)} )[1].last.posted_at_sort => [1085549830] >> h,r = Post.full_text_search("voltamos", {:sort => Ferret::Search::SortField.new(:rank_sort, :reverse => true)} )[1].first.posted_at_sort => [1146857920] >> h,r = Post.full_text_search("voltamos", {:sort => Ferret::Search::SortField.new(:rank_sort, :reverse => :true)} )[1].first.posted_at_sort => [1146857920] >> h,r = Post.full_text_search("voltamos", {:sort => Ferret::Search::SortField.new(:rank_sort, :reverse => :false)} )[1].first.posted_at_sort => [1146857920] >> h,r = Post.full_text_search("voltamos", {:sort => Ferret::Search::SortField.new("rank_sort", :reverse => :false)} )[1].first.posted_at_sort => [1146857920] >> h,r = Post.full_text_search("voltamos", {:sort => Ferret::Search::SortField.new("rank_sort", :reverse => false)} )[1].first.posted_at_sort => [1146857920] >> h,r = Post.full_text_search("voltamos", {:sort => Ferret::Search::SortField.new("rank_sort", :reverse => true)} )[1].first.posted_at_sort => [1146857920] >> h,r = Post.full_text_search("voltamos", {:sort => Ferret::Search::SortField.new("rank_sort", :reverse => :true)} )[1].first.posted_at_sort => [1146857920] >> h,r = Post.full_text_search("voltamos", {:sort => Ferret::Search::SortField.new("rank_sort", :reverse => :yes)} )[1].first.posted_at_sort => [1146857920] >> h,r = Post.full_text_search("voltamos", {:sort => Ferret::Search::SortField.new("rank_sort", :reverse => :no)} )[1].first.posted_at_sort => [1146857920] >> h,r = Post.full_text_search("voltamos", {:sort => Ferret::Search::SortField.new("rank_sort", :reverse => :no)} )[1].last.posted_at_sort => [1085549830] >> Can anyone help me with this? []s Manoel -- Posted via http://www.ruby-forum.com/. From john at digitalpulp.com Sat Mar 31 11:53:28 2007 From: john at digitalpulp.com (John Joseph Bachir) Date: Sat, 31 Mar 2007 11:53:28 -0400 Subject: [Ferret-talk] nil's representation in the index? In-Reply-To: <20070331093019.GC10093@cordoba.webit.de> References: <33664F14-0FAE-4754-92F6-6711415B234D@digitalpulp.com> <20070330073356.GB580@cordoba.webit.de> <63788E1E-C3EA-4903-ADFB-57F76A1FDBDE@digitalpulp.com> <20070331093019.GC10093@cordoba.webit.de> Message-ID: <7BC13A4C-7CE6-488E-9DF1-EB1F697B6350@digitalpulp.com> On Mar 31, 2007, at 5:30 AM, Jens Kraemer wrote: >> I am trying to query for objects that match certain terms in field A, >> and field B = nil in the model. (The column is untokenized) > > I think the only way to achieve this is to epresent the nil with some > non-nil value in the index ('NIL' or something like this) so you can > search for it. That's precisely what I ended up doing. :) Thanks. From john at digitalpulp.com Sat Mar 31 12:02:43 2007 From: john at digitalpulp.com (John Joseph Bachir) Date: Sat, 31 Mar 2007 12:02:43 -0400 Subject: [Ferret-talk] DRb server & aaf gem In-Reply-To: <20070331092502.GB10093@cordoba.webit.de> References: <20070331092502.GB10093@cordoba.webit.de> Message-ID: <3D007C4C-9F03-4CB5-940E-AD084A933DE6@digitalpulp.com> On Mar 31, 2007, at 5:25 AM, Jens Kraemer wrote: > On Sat, Mar 31, 2007 at 01:31:39AM -0400, John Joseph Bachir wrote: >> I'm having problems getting the DRb server running with the aaf gem. >> I tried it with the plugin installed in my application, and it >> worked. I suspect the problem has something to do with the startup >> scripts expecting certain files to be in certain relative file paths. > > I guess you did create ferret_server.yml in RAILS_ROOT/config? > What's the exact error you get? Yes, I created the ferret_server.yml in /config and added the remote declaration to my models. Then I put the three startup scripts into / script/aaf/, and I was trying to run them from there. I got errors similar to the ones in the comments of your blog post announcing AAF 0.4.0[1] A blog post[2] mentions something that I think the wiki does not, "add AAF_REMOTE = true to your environment file". Is this necessary? Thanks, John [1] http://www.jkraemer.net/2007/3/24/acts_as_ferret-0-4-0-rie [2] http://www.subelsky.com/2007/03/pitfalls-of-actsasferret-with- drbserver.html -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070331/e67a2f53/attachment.html From john at digitalpulp.com Sat Mar 31 12:11:41 2007 From: john at digitalpulp.com (John Joseph Bachir) Date: Sat, 31 Mar 2007 12:11:41 -0400 Subject: [Ferret-talk] DRb server & aaf gem In-Reply-To: <3D007C4C-9F03-4CB5-940E-AD084A933DE6@digitalpulp.com> References: <20070331092502.GB10093@cordoba.webit.de> <3D007C4C-9F03-4CB5-940E-AD084A933DE6@digitalpulp.com> Message-ID: On Mar 31, 2007, at 12:02 PM, John Joseph Bachir wrote: > I got errors similar to the ones in the comments of your blog post > announcing AAF 0.4.0[1] > [1] http://www.jkraemer.net/2007/3/24/acts_as_ferret-0-4-0-rie This is actually not correct. Here is the error I am getting: jjb-g4-laptop:~/digitalpulp/ffog$ RAILS_ENV=production script/aaf/ ferret_start ./script/../config/../vendor/rails/railties/lib/commands/runner.rb: 45: ./script/../config/../vendor/rails/activerecord/lib/../../ activesupport/lib/active_support/dependencies.rb:266:in `load_missing_constant': uninitialized constant ActsAsFerret (NameError) from ./script/../config/../vendor/rails/activerecord/ lib/../../activesupport/lib/active_support/dependencies.rb:452:in `const_missing' from ./script/../config/../vendor/rails/activerecord/ lib/../../activesupport/lib/active_support/dependencies.rb:464:in `const_missing' from (eval):22 from script/runner:3:in `eval' from ./script/../config/../vendor/rails/railties/lib/ commands/runner.rb:45 from script/runner:3 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070331/c7a73d6d/attachment.html From jjm at codewell.com Sat Mar 31 11:36:41 2007 From: jjm at codewell.com (Jeff Mallatt) Date: Sat, 31 Mar 2007 11:36:41 -0400 Subject: [Ferret-talk] not understanding search results Message-ID: <7.0.1.0.2.20070331113203.0393ac90@codewell.com> I'm getting some results that I don't understand from a search. The code, based on the tutorial, and the results are below. Everything makes sense to me, except the results for the 'title:"Some"' query. I would think that it should match the first two documents, but not the third. What am I missing here? Thanks for any help! --- code ----------------------------------------------------- require 'ferret' def query(index, query_str) puts("Query '#{query_str}'...") index.search_each(query_str) do |id, score| puts(" id=#{id} score=#{score} uid=#{index[id][:uid]} title='#{index[id][:title]}'") end end index = Ferret::Index::Index.new index << {:uid => 'one', :title => 'Some Title', :content => 'my first text'} index << {:uid => 'two', :title => 'Some Title', :content => 'some second content'} index << {:uid => 'three', :title => 'Other Title', :content => 'my third text'} query(index, 'content:"text"') query(index, 'content:"some"') query(index, 'title:"Some"') query(index, 'title:"Title"') query(index, 'uid:"two"') --- results --------------------------------------- Query 'content:"text"'... id=0 score=0.625 uid=one title='Some Title' id=2 score=0.625 uid=three title='Other Title' Query 'content:"some"'... id=1 score=0.125318586826324 uid=two title='Some Title' Query 'title:"Some"'... id=0 score=0.0554137788712978 uid=one title='Some Title' id=1 score=0.0554137788712978 uid=two title='Some Title' id=2 score=0.0554137788712978 uid=three title='Other Title' Query 'title:"Title"'... id=0 score=0.712317943572998 uid=one title='Some Title' id=1 score=0.712317943572998 uid=two title='Some Title' id=2 score=0.712317943572998 uid=three title='Other Title' Query 'uid:"two"'... id=1 score=1.0 uid=two title='Some Title' From andreas.korth at gmx.net Sat Mar 31 13:41:06 2007 From: andreas.korth at gmx.net (Andreas Korth) Date: Sat, 31 Mar 2007 19:41:06 +0200 Subject: [Ferret-talk] not understanding search results In-Reply-To: <7.0.1.0.2.20070331113203.0393ac90@codewell.com> References: <7.0.1.0.2.20070331113203.0393ac90@codewell.com> Message-ID: On Mar 31, 2007, at 5:36 PM, Jeff Mallatt wrote: > I'm getting some results that I don't understand from a search. > > index << {:uid => 'one', :title => 'Some Title', :content => 'my > first text'} > index << {:uid => 'two', :title => 'Some Title', :content => 'some > second content'} > index << {:uid => 'three', :title => 'Other Title', :content => 'my > third text'} > > query(index, 'title:"Some"') > query(index, 'title:"Title"') > query(index, 'uid:"two"') Nice one. When people don't understand search results, it's usually to do with stop words. The StandardAnalyzer which parses documents and(!) queries, uses a list of stop words which are ignored. See Ferret::Analysis::FULL_ENGLISH_STOP_WORDS for a complete list of (english) stop words. In the case of "title:Some", "Some" is removed by the analyzer giving only "title:", i.e. an empty query which (surprisingly) matches all documents. However, the same should happen with "content:some" but this one returns only one document which leaves me completely puzzled. This just isn't consistent. So I'm afraid I can't be of much help here, but I'm sure somebody else will enlighten us. This might as well be a bug, but even if it's not, it's definitely not what anyone would reasonably expect. -- @David: You should probably consider changing StandardAnalyzer not to use stop words by default. It confuses people because no one would suspect such a feature to be enabled by default. It just doesn't follow the principle of least astonishment. Even if people want to use stop words, they might not be happy with the ones built into Ferret. It very much depends on the nature of the content that is indexed and instead of using a one-size-fit-all stop word list one is usually better off with compiling a custom one for any particular application. Cheers, Andy From marvin at rectangular.com Sat Mar 31 14:46:41 2007 From: marvin at rectangular.com (Marvin Humphrey) Date: Sat, 31 Mar 2007 11:46:41 -0700 Subject: [Ferret-talk] not understanding search results In-Reply-To: References: <7.0.1.0.2.20070331113203.0393ac90@codewell.com> Message-ID: On Mar 31, 2007, at 10:41 AM, Andreas Korth wrote: > @David: You should probably consider changing StandardAnalyzer not to > use stop words by default. It confuses people because no one would > suspect such a feature to be enabled by default. It just doesn't > follow the principle of least astonishment. > > Even if people want to use stop words, they might not be happy with > the ones built into Ferret. It very much depends on the nature of the > content that is indexed and instead of using a one-size-fit-all stop > word list one is usually better off with compiling a custom one for > any particular application. I concur. Ferret's StandardAnalyzer is based upon Lucene's class of the same name, so some parallelism would be lost, but I think omitting stop lists is better nonetheless. There are performance and disk-space implications for avoiding stop lists by default. However, disk space is cheap, Ferret is fast, and search results are slightly better when you avoid stop lists (e.g. searching for '"the who"' actually returns something). Users with large deployments will be able to trade away some amount of IR precision for increased performance by enabling stop lists if they so choose. KinoSearch doesn't have a StandardAnalyzer; a class called PolyAnalyzer fills that role. By default, it performs lowercasing, tokenizing and stemming -- but no stopalizing. Marvin Humphrey Rectangular Research http://www.rectangular.com/ From hsandjaja at gmail.com Sat Mar 31 16:43:12 2007 From: hsandjaja at gmail.com (Harman Sandjaja) Date: Sat, 31 Mar 2007 22:43:12 +0200 Subject: [Ferret-talk] Problem with setting up remote indexing In-Reply-To: <20070331084445.GA10093@cordoba.webit.de> References: <54d4de205945b182657d5ad14682ad65@ruby-forum.com> <20070331084445.GA10093@cordoba.webit.de> Message-ID: Thank you for the reply Jens! What I'm trying to do is to only use the DRb server in the production environment (not in development nor test). So I removed the development and test sections in the config/ferret_server.yml But I'm getting this error instead: vendor/plugins/acts_as_ferret/lib/act_methods.rb:66:in `acts_as_ferret': You have a nil object when you didn't expect it! (NoMethodError) You might have expected an instance of Array. Sorry for not being clear enough in my post earlier. -- Posted via http://www.ruby-forum.com/. From zackizacki at gmx.net Sat Mar 31 17:56:48 2007 From: zackizacki at gmx.net (Rainer Kern) Date: Sat, 31 Mar 2007 23:56:48 +0200 Subject: [Ferret-talk] =?utf-8?q?Problem_with_encoding_=28Umlaut=3A_=C3=BC?= =?utf-8?b?LCDDpC4uLik=?= Message-ID: Hi there from Germany, I just installed and set up ferret and act_as_ferret for rails. All of them at the most recent version. The development environment is running fine with Mac OS X. But I got problems with the productive environment (debian). In the (mysql-)database are few records stored, containig german umlauts (? for example). Running a query for "k?ln" returns the correct record in dev-environment but NOTHING at the debian system. But the logs are looking good. It seems the word was correctly submited: Processing SearchController#result (for 127.0.0.1 at 2007-03-31 23:45:47) [POST] Session ID: 55f4544e0b28e991a1460b05dc09744c Parameters: {"commit"=>"suchen", "action"=>"result", "controller"=>"search", "query"=>"k?ln"} I read a few things here in the forum and elsewhere, but not find any solution. Would you please give me some pointers? I really get not through these encoding, locales, collation and stuff. What do I have to configure how? It would be really nice if you could help. -- Posted via http://www.ruby-forum.com/. From andreas.korth at gmx.net Sat Mar 31 18:33:43 2007 From: andreas.korth at gmx.net (Andreas Korth) Date: Sun, 1 Apr 2007 00:33:43 +0200 Subject: [Ferret-talk] =?iso-8859-1?q?Problem_with_encoding_=28Umlaut=3A_?= =?iso-8859-1?q?=FC=2C_=E4=2E=2E=2E=29?= In-Reply-To: References: Message-ID: On Mar 31, 2007, at 11:56 PM, Rainer Kern wrote: > I just installed and set up ferret and act_as_ferret for rails. All of > them at the most recent version. The development environment is > running > fine with Mac OS X. But I got problems with the productive environment > (debian). > > In the (mysql-)database are few records stored, containig german > umlauts > (? for example). Running a query for "k?ln" returns the correct record > in dev-environment but NOTHING at the debian system. But the logs are > looking good. It seems the word was correctly submited: Your system locale should be set to UTF-8. Use the 'locale' command to view the current settings and change the LANG and LC_ALL environment variables if necessary. (In your case they should probably be set to "de_DE.UTF-8") MySQL should be configured to use UTF-8 as well. You can either specify the character set for each individual table (via the CREATE TABLE command) or globally in my.cnf: character-set-server = utf8 default-character-set = utf8 Finally, for Rails, add the following lines to environment.rb: $KCODE = 'u' require 'jcode' I recommend to use UTF-8 throughout the whole stack (OS, MySQL, Rails). That way you'll get rid of your encoding problems once and for all. Cheers, Andy From alexkane at gmail.com Sat Mar 31 18:49:29 2007 From: alexkane at gmail.com (Alex Kane) Date: Sun, 1 Apr 2007 00:49:29 +0200 Subject: [Ferret-talk] Newbie problem on production server In-Reply-To: <788cdcf596143624c14d8a41af62d4c7@ruby-forum.com> References: <12d3825450af0de0f6a367f938727f47@ruby-forum.com> <20070328114533.GB19868@cordoba.webit.de> <6c7a97c1c379e8b6b79a623e31abf752@ruby-forum.com> <20070328153300.GH19868@cordoba.webit.de> <788cdcf596143624c14d8a41af62d4c7@ruby-forum.com> Message-ID: <2faa0c983cc9b6073195ed1419e7ea4a@ruby-forum.com> >> what happens if you run Model.rebuild_index on the console? This is what I get: [alex at alexkane current]$ ./script/runner -e production Link.rebuild_index ./script/../config/../vendor/rails/railties/lib/commands/runner.rb:27: /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:285:in `delete': File Not Found Error occured at :93 in xraise (FileNotFoundError) Error occured in fs_store.c:329 - fs_open_input tried to open "script/../config/../index/production/link/_h8.frq" but it doesn't exist: from /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:285:in `<<' from /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:8:in `synchrolock' from /usr/local/lib/ruby/1.8/monitor.rb:238:in `synchronize' from /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:8:in `synchrolock' from /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.3/lib/ferret/index.rb:267:in `<<' from ./script/../config/../vendor/plugins/acts_as_ferret/lib/local_index.rb:220:in `reindex_model' from ./script/../config/../vendor/plugins/acts_as_ferret/lib/local_index.rb:219:in `each' from ./script/../config/../vendor/plugins/acts_as_ferret/lib/local_index.rb:219:in `reindex_model' ... 10 levels... from ./script/runner:3:in `eval' from ./script/../config/../vendor/rails/railties/lib/commands/runner.rb:27 from ./script/runner:3:in `require' from ./script/runner:3 -- Posted via http://www.ruby-forum.com/. From john at digitalpulp.com Sat Mar 31 20:07:15 2007 From: john at digitalpulp.com (John Bachir) Date: Sat, 31 Mar 2007 20:07:15 -0400 Subject: [Ferret-talk] baffling sort problem Message-ID: I had sort-by-date working almost perfectly with my app. It was behaving as expected for most data, but had a few hiccups with certain data. I investigated and discovered that the correct data was storing this in my ferret index: "1999-10-18 00:00:00" and the incorrect data was storing this: "Mon Oct 18 00:00:00 EDT 1999" (oops...) So I of course had to fix the incorrect data, and I figured while i was at it, I would normalize and minimize everything to this format: "19991018000000". Now it seems that sorting on this column does not work at all. I have not changed how the data is stored in the index, it has always been: :search_date => {:term_vectors => :no, :index => :untokenized, :store => :yes } Any ideas? Thanks. John From john at digitalpulp.com Sat Mar 31 20:30:57 2007 From: john at digitalpulp.com (John Bachir) Date: Sat, 31 Mar 2007 20:30:57 -0400 Subject: [Ferret-talk] baffling sort problem In-Reply-To: References: Message-ID: <6FCD40EA-2B73-4E88-A18B-83D8BBB4B305@digitalpulp.com> On Mar 31, 2007, at 8:07 PM, John Bachir wrote: > I investigated and discovered that the correct data was > storing this in my ferret index: "1999-10-18 00:00:00" and the > incorrect data was storing this: "Mon Oct 18 00:00:00 EDT > 1999" (oops...) > > So I of course had to fix the incorrect data, and I figured while i > was at it, I would normalize and minimize everything to this format: > "19991018000000". > > Now it seems that sorting on this column does not work at all. I just normalized everything to the "1999-10-18 00:00:00" format, and it is working again. My guess is that ferret is treating the data differently if it is only numeric characters? I've been using ferret for quite some time and have never come accros a type issue like this. Also, on that same model, I have another ferret field, configured the very same way, that is always a number; sorting works perfectly. However, those numbers are much smaller (number of messages in the discussion thread). So maybe ferret has a problem with big numbers? Anyway, I'm glad it's working again, but would be very interested to know what the problem was. Cheers, John From john at digitalpulp.com Sat Mar 31 21:09:55 2007 From: john at digitalpulp.com (John Bachir) Date: Sat, 31 Mar 2007 21:09:55 -0400 Subject: [Ferret-talk] indexing mostly-binary documents (.ppt) Message-ID: Here's an interesting problem: In my app, we are indexing various types of documents, including microsoft powerpoint. Powerpoint documents are mostly binary, but have a bunch of text (all of the text in the document?) as well. My thinking is that the binary will never get searched for, and the proper text will be indexed and queried as expected, so the indexed binary will never affect results. Is this correct? Then my colleague mentioned that maybe the indexed garbage would affect the weighting of certain searches? I figure that weighting is only per-search so, same situation as above, only the proper terms will be calculated. What do you folks think? John