From kraemer at webit.de Wed Oct 1 15:54:51 2008 From: kraemer at webit.de (=?ISO-8859-1?Q?Jens_Kr=E4mer?=) Date: Wed, 1 Oct 2008 21:54:51 +0200 Subject: [Ferret-talk] using tokenizers ? In-Reply-To: <60d886530809280320n1659a6c4pa63e2480a26dd882@mail.gmail.com> References: <60d886530809280320n1659a6c4pa63e2480a26dd882@mail.gmail.com> Message-ID: <29F9FC45-4225-4DB2-B238-71AA5A9A075D@webit.de> Hi! have a look at the PerFieldAnalyzer: http://ferret.davebalmain.com/api/classes/Ferret/Analysis/PerFieldAnalyzer.html cheers, Jens On 28.09.2008, at 12:20, Lyes Amazouz wrote: > Hi lis > > I using Ferret to index some files for a specific usage. > > I want to know how can I set a tokenizer for some of my index fields > and if I can choose a different tokenizer for each field. For example: > > If in my document I have two fields :F1 and :F2. What I have to do > if I want that the field :F1 will be tokenized with a > StandardTokenizer and :F2 with the WhiteSpaceTokenizer?? > > thank you > > -- > =========== > | Lyes Amazouz > | USTHB, Algiers > =========== > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49351467660 | Telefax +493514676666 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold From lyesjob at gmail.com Thu Oct 2 05:58:28 2008 From: lyesjob at gmail.com (Lyes Amazouz) Date: Thu, 2 Oct 2008 10:58:28 +0100 Subject: [Ferret-talk] using tokenizers ? In-Reply-To: <29F9FC45-4225-4DB2-B238-71AA5A9A075D@webit.de> References: <60d886530809280320n1659a6c4pa63e2480a26dd882@mail.gmail.com> <29F9FC45-4225-4DB2-B238-71AA5A9A075D@webit.de> Message-ID: <60d886530810020258i13299b62jd51d95db4d7eaca2@mail.gmail.com> Ok thank you, it was what I looked for On Wed, Oct 1, 2008 at 8:54 PM, Jens Kr?mer wrote: > Hi! > > have a look at the PerFieldAnalyzer: > > http://ferret.davebalmain.com/api/classes/Ferret/Analysis/PerFieldAnalyzer.html > > cheers, > Jens > > > > -- =========== | Lyes Amazouz | USTHB, Algiers =========== -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgs at dmu.ac.uk Fri Oct 3 13:09:43 2008 From: hgs at dmu.ac.uk (Hugh Sasse) Date: Fri, 3 Oct 2008 18:09:43 +0100 (BST) Subject: [Ferret-talk] Getting started in Ferret. Message-ID: I recent found the ferret book, looked at it and decided it was time I looked into this. A few questions, having rushed through most of the book once. O'Reilly books traditionally point to a place where the code can be downloaded (and one usually gets a tarball split into chapters). Is there such a place for the Ferret book? I see nothing mentioned... I'd like to use this for searching my files of various types. There is an example of how to do this for text, and then there is explanation of how to build analysers for other file types. I'd like to know if it is intended that pre-made analysers will not be provided? I'm thinking that, supposing I write something that will take a file in mbox format, break it into messages, and provide a document with info about From:, To:, Bcc:, and Subject lines, and maybe information about the MIME types (if I ever get that far), would this kind of thing ever get included in Ferret, so programmers better than I am can build on it? I looked for gems with ferret "goodies" in them for this sort of thing, but `gem query -r -n /ferret/i` turned up nothing now that I've installed ferret. Maybe I munged the format of the command. I also get this impression because the ferret web site points to various other places for this sort of thing. Finally, I was rather alarmed, when looking through the archives, that a lot of the discussion is about people finding something else to use instead. Are patches being accepted? Thank you, Hugh From samuelgiffney at gmail.com Sun Oct 5 21:08:40 2008 From: samuelgiffney at gmail.com (Sam Giffney) Date: Mon, 6 Oct 2008 03:08:40 +0200 Subject: [Ferret-talk] Ferret falling through the Monit cracks Message-ID: <815e40e21ad9352ae0d790085e5481da@ruby-forum.com> I have a standard AAF Ferret installation running over DRB, serving several thousand searches daily. I use the monit script as provided in the AAF package to ensure the DRB server stays up. I also use the totalmem line which is commented out by default if totalmem > 60.0 MB for 5 cycles then restart Twice in the last two months the DRB server has failed, but monit thinks it's fine so it has required manual intervention to solve (both times I have just restarted the server) In both occasions it directly follows the following actions in the monit log, ferret exceeds the memory limits and is restarted, seemingly successfully. 07:41:54] error : 'ferret' total mem amount of 63140kB matches resource limit [total mem amount>61440kB] 07:44:54] error : 'ferret' total mem amount of 63140kB matches resource limit [total mem amount>61440kB] 07:47:54] error : 'ferret' total mem amount of 63140kB matches resource limit [total mem amount>61440kB] 07:50:54] error : 'ferret' total mem amount of 63140kB matches resource limit [total mem amount>61440kB] 07:53:54] error : 'ferret' total mem amount of 63140kB matches resource limit [total mem amount>61440kB] 07:53:54] info : 'ferret' trying to restart 07:53:54] info : 'ferret' stop: /bin/su 07:53:57] info : 'ferret' start: /bin/su 07:53:57] error : 'ferret' failed, cannot open a connection to INET[localhost:9010] via TCP 07:53:57] error : 'ferret' failed, cannot open a connection to INET[localhost:9010] via TCP 07:56:57] info : 'ferret' resource passed 07:56:57] info : 'ferret' connection passed to INET[localhost:9010] via TCP So at this stage ferret should be up and fine but... Now any attempt to use ferret gets the error (Entry is the model) A ActsAsFerret::IndexNotDefined occurred in search#index: entry (druby:/localhost:9010) /var/www/releases/20080922044701/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:264:in `get_index' I've tried to simulate this in the staging environment with an artificially low memory limit but everything seems to reset fine. Anyone see anything similar themselves, do anything different or have any suggestions? Sam -- Posted via http://www.ruby-forum.com/. From cgansen at gmail.com Fri Oct 10 14:34:18 2008 From: cgansen at gmail.com (Chris G.) Date: Fri, 10 Oct 2008 20:34:18 +0200 Subject: [Ferret-talk] :single_index deprecated? Message-ID: <630d5a7a4eebd6187165a9e0921ac932@ruby-forum.com> I noticed that the :single_index option is no longer listed in the documentation for AAF, specifically in lib/acts_methods.rb [1] Furthermore, after an upgrade from a rather old version of AAF, I notice that the index (after reindexing) is no longer in RAILS_ROOT/index/RAILS_ENV/shared, but instead split out into separate directories for each model. What changed? Thanks, -chris [1] Notice it's still listed as an option here: http://actsasferret.rubyforge.org/classes/ActsAsFerret/ActMethods.html but not at http://projects.jkraemer.net/acts_as_ferret/browser/trunk/plugin/acts_as_ferret/lib/act_methods.rb -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Oct 13 12:11:33 2008 From: kraemer at webit.de (=?ISO-8859-1?Q?Jens_Kr=E4mer?=) Date: Mon, 13 Oct 2008 18:11:33 +0200 Subject: [Ferret-talk] :single_index deprecated? In-Reply-To: <630d5a7a4eebd6187165a9e0921ac932@ruby-forum.com> References: <630d5a7a4eebd6187165a9e0921ac932@ruby-forum.com> Message-ID: <036DF33F-6826-40C2-A01D-C4CA11A88706@webit.de> Hi! On 10.10.2008, at 20:34, Chris G. wrote: > I noticed that the :single_index option is no longer listed in the > documentation for AAF, specifically in lib/acts_methods.rb [1] > > Furthermore, after an upgrade from a rather old version of AAF, I > notice > that the index (after reindexing) is no longer in > RAILS_ROOT/index/RAILS_ENV/shared, but instead split out into separate > directories for each model. What changed? :single_index isn't supported any more. With current trunk the preferred way to declare an index is via the ActsAsFerret::define_index method, in a file called config/aaf.rb. There you may declare several classes using the same index like this: ActsAsFerret.define_index('my_index', :models => { OneModel => { :fields => { :title => { :boost => 2 }, :description => { } }, :if => Proc.new { |r| r.published? } }, AnotherModel => { :fields => { :title => { :boost => 2 }, :another_field => {}, }, :if => Proc.new { |r| r.should_index? } }, :ferret => { :default_field => %w( title description another_field ) }) No need to call acts_as_ferret inside your model anymore (though it will still work for the one index per class scenario). my_index is the name of the index, to be used with ActsAsFerret::find: ActsAsFerret::find("some query", 'my_index') to only search records of a given class, find_with_ferret works as expected: OneModel.find_with_ferret('some query') Sorry for not documenting this stuff better... cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49351467660 | Telefax +493514676666 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold From cgansen at gmail.com Mon Oct 13 16:19:05 2008 From: cgansen at gmail.com (Chris Gansen) Date: Mon, 13 Oct 2008 15:19:05 -0500 Subject: [Ferret-talk] :single_index deprecated? In-Reply-To: <036DF33F-6826-40C2-A01D-C4CA11A88706@webit.de> References: <630d5a7a4eebd6187165a9e0921ac932@ruby-forum.com> <036DF33F-6826-40C2-A01D-C4CA11A88706@webit.de> Message-ID: <6CCF486C-59E0-487C-B991-AC337C55FFCA@gmail.com> Thanks for the update, Jens. Unfortunately, it looks like I'll have to stick with the stable branch for now - these changes in trunk don't jive well with my apps. Thanks and keep up the great work! -chris On Oct 13, 2008, at 11:11 AM, Jens Kr?mer wrote: > Hi! > > On 10.10.2008, at 20:34, Chris G. wrote: > >> I noticed that the :single_index option is no longer listed in the >> documentation for AAF, specifically in lib/acts_methods.rb [1] >> >> Furthermore, after an upgrade from a rather old version of AAF, I >> notice >> that the index (after reindexing) is no longer in >> RAILS_ROOT/index/RAILS_ENV/shared, but instead split out into >> separate >> directories for each model. What changed? > > :single_index isn't supported any more. > > With current trunk the preferred way to declare an index is via the > ActsAsFerret::define_index method, in a file called config/aaf.rb. > There you may declare several classes using the same index like this: > > ActsAsFerret.define_index('my_index', > :models => { > OneModel => { > :fields => { > :title => { :boost => 2 }, > :description => { } > }, > :if => Proc.new { |r| r.published? } > }, > AnotherModel => { > :fields => { > :title => { :boost => 2 }, > :another_field => {}, > }, > :if => Proc.new { |r| r.should_index? } > }, :ferret => { > :default_field => %w( title description > another_field ) > }) > > No need to call acts_as_ferret inside your model anymore (though it > will still work for the one index per class scenario). > my_index is the name of the index, to be used with ActsAsFerret::find: > > ActsAsFerret::find("some query", 'my_index') > > to only search records of a given class, find_with_ferret works as > expected: > > OneModel.find_with_ferret('some query') > > > Sorry for not documenting this stuff better... > > > cheers, > Jens > > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49351467660 | Telefax +493514676666 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From timg at catalyst.net.nz Thu Oct 16 02:35:01 2008 From: timg at catalyst.net.nz (Timothy Goddard) Date: Thu, 16 Oct 2008 19:35:01 +1300 Subject: [Ferret-talk] Official repo? Message-ID: <200810161935.08048.timg@catalyst.net.nz> Hi all, After a bit of searching I finally found: A github site - http://github.com/dbalmain/ferret And a git repository - git://github.com/dbalmain/ferret.git They were apparently created by a user called 'dbalmain', suspiciously familiar! Is this the oft mentioned but thus far invisible git repository? It's been inactive since June and Dave seems to be silent. I do hope he hasn't been hit by a bus :( . -- Timothy Goddard Catalyst IT Ltd. ? +64 4 803 2399 ? PO Box 11-053, Manners St, Wellington 6142, New Zealand -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part. URL: From lyesjob at gmail.com Thu Oct 16 10:01:04 2008 From: lyesjob at gmail.com (Lyes Amazouz) Date: Thu, 16 Oct 2008 15:01:04 +0100 Subject: [Ferret-talk] acts_as_ferret and indexes created with ferret Message-ID: <60d886530810160701q18b5e170q8688044a11a4a94a@mail.gmail.com> Hello List! I've have created an index using a Rerret based application, and may aim is to create a Ruby Rails Web application that will allows to search using my index! I tried yo use the acts_as_ferret plugin for Rails, but when I have read the tutorials, it seemed to mes that acts_as_ferret allows only to index the Ruby Rails web application data stired in it's data base, and helps to operate researches using the created index. But I really wonder if acts_as_ferret may allow to load an external already created index (Like the index I've created with my application) in order to use the acts_as_ferret technology to make researches on it and display the results using a Ruby Rails application! Is this possible, if not, have any one any idea! thank you! -- =========== | Lyes Amazouz | USTHB, Algiers =========== -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemer at webit.de Thu Oct 16 10:13:33 2008 From: kraemer at webit.de (=?ISO-8859-1?Q?Jens_Kr=E4mer?=) Date: Thu, 16 Oct 2008 16:13:33 +0200 Subject: [Ferret-talk] acts_as_ferret and indexes created with ferret In-Reply-To: <60d886530810160701q18b5e170q8688044a11a4a94a@mail.gmail.com> References: <60d886530810160701q18b5e170q8688044a11a4a94a@mail.gmail.com> Message-ID: <489395D5-B2EA-414D-8BE9-7328F2C52C56@webit.de> Hi! On 16.10.2008, at 16:01, Lyes Amazouz wrote: > > I've have created an index using a Rerret based application, and may > aim is to create a Ruby Rails Web application that will allows to > search using my index! > > I tried yo use the acts_as_ferret plugin for Rails, but when I have > read the tutorials, it seemed to mes that acts_as_ferret allows only > to index the Ruby Rails web application data stired in it's data > base, and helps to operate researches using the created index. > > But I really wonder if acts_as_ferret may allow to load an external > already created index (Like the index I've created with my > application) in order to use the acts_as_ferret technology to make > researches on it and display the results using a Ruby Rails > application! > > Is this possible, if not, have any one any idea! If you don't plan to modify the ferret index from inside your Rails application, the easiest way is to use Ferret's API directly for querying your index. Acts_as_ferret won't be of much help in your case, since it's focus is to keep an index in sync with a database that is accessed via active record. Cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49351467660 | Telefax +493514676666 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold From lyesjob at gmail.com Thu Oct 16 10:31:42 2008 From: lyesjob at gmail.com (Lyes Amazouz) Date: Thu, 16 Oct 2008 15:31:42 +0100 Subject: [Ferret-talk] acts_as_ferret and indexes created with ferret In-Reply-To: <489395D5-B2EA-414D-8BE9-7328F2C52C56@webit.de> References: <60d886530810160701q18b5e170q8688044a11a4a94a@mail.gmail.com> <489395D5-B2EA-414D-8BE9-7328F2C52C56@webit.de> Message-ID: <60d886530810160731p2156874em4a58a516cdb6c372@mail.gmail.com> On Thu, Oct 16, 2008 at 3:13 PM, Jens Kr?mer wrote: > Hi! > > On 16.10.2008, at 16:01, Lyes Amazouz wrote: > >> >> I've have created an index using a Rerret based application, and may aim >> is to create a Ruby Rails Web application that will allows to search using >> my index! >> >> I tried yo use the acts_as_ferret plugin for Rails, but when I have read >> the tutorials, it seemed to mes that acts_as_ferret allows only to index the >> Ruby Rails web application data stired in it's data base, and helps to >> operate researches using the created index. >> >> But I really wonder if acts_as_ferret may allow to load an external >> already created index (Like the index I've created with my application) in >> order to use the acts_as_ferret technology to make researches on it and >> display the results using a Ruby Rails application! >> >> Is this possible, if not, have any one any idea! >> > > If you don't plan to modify the ferret index from inside your Rails > application, the easiest way is to use Ferret's API directly for querying > your index. Acts_as_ferret won't be of much help in your case, since it's > focus is to keep an index in sync with a database that is accessed via > active record. > Hello Jens! Thank you for your reply, now it is more clear for me! But is there some to use the acts_as_ferret features (Like paginating and Highlighting) when I will display the results of my researche! for example, When I make a search on my index with ferret API, is it possible that I give acts_as_ferret the resulting topDocs object or the list of the hits to acts_as_ferret in such way that in the web page, this list will be paginated and the key words higlighted! Thanks > > > Cheers, > Jens > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49351467660 | Telefax +493514676666 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- =========== | Lyes Amazouz | USTHB, Algiers =========== -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgs at dmu.ac.uk Tue Oct 21 05:30:16 2008 From: hgs at dmu.ac.uk (Hugh Sasse) Date: Tue, 21 Oct 2008 10:30:16 +0100 (BST) Subject: [Ferret-talk] Stuart Rackham's Ferret Helper. Message-ID: I've put some modified versions of Stuart Rackham's Ferret Helper up at http://www.eng.cse.dmu.ac.uk/~hgs/ruby/#ff.rb My mods are to improve the number of file types searched, and to ease the configuration a little. I've got this to do what I need it to do so far, and don't know if I will add further mods or not. At present I have other pressures, so it will be a while before I look into perfecting this further. That web server is only in existence under sufferance, so the link may vanish (possibly after Christmas) but in the hope that this is of use to someone... Hugh From jk at jkraemer.net Tue Oct 28 15:03:03 2008 From: jk at jkraemer.net (=?utf-8?Q?Jens_Kr=c3=a4mer?=) Date: Tue, 28 Oct 2008 20:03:03 +0100 Subject: [Ferret-talk] List of terms matched by a query (and their position/offset) In-Reply-To: References: Message-ID: <50fff16eb744c02886f9c7e5744c832b@ruby-forum.com> Hi, first of all, please don't use the web forum to ask questions, but use the mailing list (ferret-talk at rubyforge.org). Unfortunately it seems that not every message posted here makes it to the mailing list, and I don't check the forum here very often... The other way around (messages posted via email) works reliably, so in the end you'll reach more people... Karl Meisterheim wrote: > Hi, > > I have some xml that represents a document. I parse the xml and place > specific parts (like the title) into the appropriate fields in my > document. The xml contains the normal document elements like a title, > body etc. It also contains illustrations, of which there may be 0 or > many for a given document. Each illustration also has a title and > caption text. > > I'm struggling to figure out how to index this data, since there are > many documents in my xml dataset and each document may have a random > number of illustrations. Therefore, I can't just add several fields to > my index like illustration1, illustration2, etc. > > Instead, the only way I can think to do it is grab all of the > illustration / caption text for a given document and glob it together > into one field, :illustration. > > This will work fine, searches will match terms in that field. The > problem comes when wanting to distinguish which illustration the term > belonged to. the answer is simple - whatever is the smallest unit you want to get as a search result is what you have to index. So if you want to find out which illustration a query matches you'll have to index each illustration as a separate document (in the Ferret sense of the word). You should then index the document's id along with each illustration, and maybe even shared information like the document title. Or build a separate index for global document data to avoid that redundancy. however then you would have to run each query twice - against the document index, and against the illustrations index. trade off between indexing speed (2 indexes and therefore no indexing of redundant information means faster indexing) versus search speed (searching once vs. searching twice for each user query)... Does that sound like it might work? Cheers, Jens -- Posted via http://www.ruby-forum.com/. From jk at jkraemer.net Tue Oct 28 15:38:16 2008 From: jk at jkraemer.net (=?utf-8?Q?Jens_Kr=c3=a4mer?=) Date: Tue, 28 Oct 2008 20:38:16 +0100 Subject: [Ferret-talk] Indexed? In-Reply-To: References: Message-ID: <526d80f8cc754fe26e198fd8256526db@ruby-forum.com> Hi, I'm not exactly sure what you're doing there, but 'and' is in Ferret's list of default stop words and therefore won't be indexed by default. maybe this is the whole problem? Cheers, Jens PS: please post via the real mailing list, this web interface sucks for posting. Tom Bak wrote: > Hi, > I have problem with indexing simple structire. > Indexing single element works, but dong it in a little bit more complex > code fails. [..] > # this gives no results > puts index.search('word: "and"') > > # but his works: > #index << CorpusIndexedElement.new("a test > sentence",0,"test","xy").to_ferret_index_hash > #puts index.search('word: "test"') > > I run out of ideas :( > > Cheers, > Tomasz -- Posted via http://www.ruby-forum.com/. From jk at jkraemer.net Tue Oct 28 15:41:53 2008 From: jk at jkraemer.net (=?utf-8?Q?Jens_Kr=c3=a4mer?=) Date: Tue, 28 Oct 2008 20:41:53 +0100 Subject: [Ferret-talk] acts_as_ferret - need to remove markup from column first In-Reply-To: References: Message-ID: <2d05190a034bcde2af374b7a4b8e482d@ruby-forum.com> Hi! add a 'virtual field' to your index that contains the value without markup, i.e. class YourModel acts_as_ferret :fields => [ ;title, :body_searchable ] def body_searchable strip_markup(self.body) end end Cheers, Jens Gaudi Mi wrote: > We're using acts_as_ferret on a column called 'body' which normally has > html markup in it. We want to continue to store the markup in our > database, but we want to remove the markup from that field before Ferret > indexes it. We're thinking there would be an option to intercept the > value that Ferret is indexing, for pre-processing, but we can't find it. > > So we either need to know how Ferret might support this requirement, or, > we think we can skip the acts_as_ferret altogether and manually call > LocalIndex.add or something like that and therefore have more control > over what's being indexed. > > Thanks. -- Posted via http://www.ruby-forum.com/. From jk at jkraemer.net Tue Oct 28 15:10:41 2008 From: jk at jkraemer.net (=?utf-8?Q?Jens_Kr=c3=a4mer?=) Date: Tue, 28 Oct 2008 20:10:41 +0100 Subject: [Ferret-talk] :multi search (again) In-Reply-To: <93df9f1254abe7137442602d5d52c99f@ruby-forum.com> References: <93df9f1254abe7137442602d5d52c99f@ruby-forum.com> Message-ID: Hi! First of all you should decide which exact behaviour you want - do you want your search results contain different model objects like tracks, releases and artists? Or do you always want to find a certain type of model object, but also find it when somebody searches for a word contained in a property of some related record (like, say, find releases of an artist when searching for his name)? Or a combination of both, so that searching for an artists name will return all his releases and the artist record itself? First of all, every single model you expect in your result set has to call acts_as_ferret. Then, for every record, index all the information you think is useful, like you intend to do with the release names in your artist model. The artist_releases method has to return a string containing the value you want to index - so something like self.releases.map(&:name).join(' ') should work. Watch the logs to see what values actually get indexed. Cheers, Jens Dave Dave wrote: > apologies for the poor form, but I've modified the below > >> >> def artist_releases >> return "#{self.release_artist.release.name}" >> end >> > > to say this > > def artist_releases > self.releases > end > > this works, in as much as self.releases will return an array of releases > by an artists, because of the HABTM relationship defined previously. > However, I'm still confused as to how this then translates into the > ability to search for "an artist" and to be returned related releases by > said artist. > > If anyone can help, or offer some pointers on this, I'd be quite > appreciative. > > thanks -- Posted via http://www.ruby-forum.com/. From lyesjob at gmail.com Mon Oct 27 12:33:26 2008 From: lyesjob at gmail.com (Lyes Amazouz) Date: Mon, 27 Oct 2008 17:33:26 +0100 Subject: [Ferret-talk] segmentation fault while combing stopfilter and stem filter Message-ID: <60d886530810270933x870624dm61c33ba388480174@mail.gmail.com> Hi list I've made a personal analyzer that combine the stemming and the stop words filtering, my indexing works fine for a while, but it crashes with a segmentation fault when I use the stemming alone, it works very fine may I have some help thanks this is the declaration of my analyzer class ContentAnalyzer < StandardAnalyzer def initialize() @stop_words = FULL_FRENCH_STOP_WORDS end def token_stream(field, input) Ferret::Analysis::StemFilter.new(super, "french") end end PS: I've seen this integration in mailing lists, it will be familiar for somones. Where is my mistake -- =========== | Lyes Amazouz | USTHB, Algiers =========== -------------- next part -------------- An HTML attachment was scrubbed... URL: From jk at jkraemer.net Tue Oct 28 15:26:23 2008 From: jk at jkraemer.net (=?utf-8?Q?Jens_Kr=c3=a4mer?=) Date: Tue, 28 Oct 2008 20:26:23 +0100 Subject: [Ferret-talk] AAF + DRb + Windows. Index works, search maybe? In-Reply-To: <1365bdb22f3f0ce1aa349c7a8871bf9d@ruby-forum.com> References: <1365bdb22f3f0ce1aa349c7a8871bf9d@ruby-forum.com> Message-ID: <78aae8dc2b3454bc9b6f0b091ebdb2a6@ruby-forum.com> Hi! Chris Dekker wrote: > Currently running the following setup: > -Ferret 0.11.5 > -Acts as Ferret plugin 0.4.3 Rev.257 > -Rails 2.1.1 > -Ruby 1.8.6 Patchlevel 111 > -Windows 2003 Server > > This all works fine and dandy and after a lot of struggling with getting > these specific versions together I managed to get it up and running on > Windows. In all my models I have set the :remote => true option on the > acts_as_ferret declaration so the DRb is used in production. > The DRb runs fine and is defined in ferret_server.yml my congrat ulations :) > Index directory is created in the project folder and the DRb seems to > index nicely in there. The top of the log says: > > DRb server: ensure_index_exists for class Contest > Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil, > looks like we are the server > Will use local index. > using index in C:/websites/BRProject/index/production/contest > > 1.) Is this correct? Or should I set this FERRET_USE_LOCAL_INDEX value? no, just ignore that possibly misleading message. As long as the server is self-aware enough to know it's the server, and the application does not think it's the server, everything is ok :) > 2.) Then after indexing all my teams in the database I get these lines > in my ferret_server.log: > > Adding field name with value 'BLAH TEAM NAME' to index > creating doc for class: Team, id: 4891 > Adding field name with value 'OTHER TEAM NAME' to index > reindex model Team : 191.58% complete : -8.05 secs to finish > Team Load (0.000000) SELECT * FROM `teams` WHERE > (id > 4891) LIMIT 1000 > SQL (0.000000) COMMIT > changing index dir to > C:/websites/BRProject/index/production/team/20081025165554 > index dir is now > C:/websites/BRProject/index/production/team/20081025165554 > #method_missing(:find_id_by_contents, ["Team", "Othe~ OR Team~", > {:limit=>4}]) > #method_missing(:find_id_by_contents, ["Team", "J~ OR P~ OR Custom~", > {:limit=>4}]) > #method_missing(:find_id_by_contents, ["Team", "Munchin~ OR Hilto~", > {:limit=>4}]) > > Followed by a similar 'method_missing' line about 1000 times. One for > each apparent search query entered on the website. Everything is ok, aaf is just a bit noisy :) you might want to raise the log level for your DRb server to get rid of these (at some point in time I introduced a configuration option for ferret_server.yml, not sure if you already have it or not in your version), or just comment out the relevant line of code that spams the log this way... > 3.) I feel like I am abusing the query language for this purpose: Fuzzy > finding team names. Often I type in a nearly exact match with the team > and it wont show up. Especially team names with '.com' in it for > example. Or exact matches yield different team names with a higher hit > rate. I often find myself doing the same. I go even further - I build different query variants from the query string the user entered, with a varying degree of exactness (i.e. exact phrase match, match with wildcards, fuzzy match), and OR them together with different boosts. this way more relevant matches are likely to be on top of the result list because their sub query receives a higher boost, but you also find matches further away from the original query in case the user had a typo or so... > Also there is a problem with matching certain reserved words. > Apparently querying the States DB with 'IN' for Indiana is not going to > work :) lowercase 'in' should work (in case you also index lowercase 'in', of course), imho the reserved words are uppercase only. Or don't use the build in query parser but construct your own query objects (which however might not work with DRb if you build them in your application) Cheers, Jens -- Posted via http://www.ruby-forum.com/. From kmeister2000 at gmail.com Fri Oct 31 10:59:58 2008 From: kmeister2000 at gmail.com (Karl Meisterheim) Date: Fri, 31 Oct 2008 10:59:58 -0400 Subject: [Ferret-talk] List of terms matched by a query (and their position/offset) Message-ID: <1cce07830810310759u94613adx86e6cb42b4493183@mail.gmail.com> Hi Jens, Thanks for the reply. What you say makes sense, but I'm hoping for a simpler implementation. I guess it boils down to this: When I conduct a search in ferret / AAF, I get an array of documents back. Somehow, the highlight method knows where the terms that were matched by the search exist in those documents / field. Is there anyway that I can get that information? I looked through the API and even the source and unfortunately, couldn't quite grok how it was happening. This will allow me to do the following, I can chunk together several distinct pieces of information into one field for the purpose of indexing. Then, if I know which terms were matched and their position in the field, I can use that information to figure out which piece of information in that single field it came from. For example, if my document has six figures, I take the caption text of those six figures and concatenate them all together and index them in one field, captions. Then, when the document comes back as a match, if I knew that the term "empire" matched whatever search query was used, and that the offset was 100 in my captions field, I could piece together which of the original illustrations it belongs to. Again, this is all necessary because I don't know in advance how many figures a given document may contain. I know this sounds overly complicated, but I think it's easier than creating a new model, a separate index and then having to search multiple times etc. Does this make sense, or am I going at this completely wrong? Thank you, -km