From samuelgiffney at gmail.com Tue Aug 1 00:48:25 2006 From: samuelgiffney at gmail.com (Sam Giffney) Date: Tue, 1 Aug 2006 06:48:25 +0200 Subject: [Ferret-talk] Per field boost values - possible? working? In-Reply-To: References: Message-ID: <8057bb55df1ad875864054c6cfd9a273@ruby-forum.com> Thanks Dave, Using the explain method proved it was definitely working. The boost value I was using, 2.0, just wasn't enough to change the placing in the test i was using. What are the (highlights of the) changes to the index that make it incompatible with Luke? Just wondering what would be involved... -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Tue Aug 1 01:09:36 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 1 Aug 2006 14:09:36 +0900 Subject: [Ferret-talk] Per field boost values - possible? working? In-Reply-To: <8057bb55df1ad875864054c6cfd9a273@ruby-forum.com> References: <8057bb55df1ad875864054c6cfd9a273@ruby-forum.com> Message-ID: On 8/1/06, Sam Giffney wrote: > Thanks Dave, > Using the explain method proved it was definitely working. The boost > value I was using, 2.0, just wasn't enough to change the placing in the > test i was using. Great. One thing I neglected to mention was that the field_norm value that you see in the Index#explain output is actually the field boost (I may change the name as it's not really clear). You'll notice that 1.0 and 2.0 get converted to 0.625 and 1.25 respectively. This is because the the boost gets compressed into a single byte so it looses a lot of it's precision. This is just something to keep in mind when setting boost values. > What are the (highlights of the) changes to the index that make it > incompatible with Luke? Just wondering what would be involved... The only thing staying the same is the field norms files. Everything else is changing so it wouldn't be worth doing it in Java using any of the existing Luke code. It'd have to be completely rewritten in Ruby. I haven't done any GUI stuff in ruby before so I'm not sure which library would be best. If anyone has any recommendations I could probably start something and then others could play around with it. Cheers, Dave From kraemer at webit.de Tue Aug 1 04:22:38 2006 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 1 Aug 2006 10:22:38 +0200 Subject: [Ferret-talk] Per field boost values - possible? working? In-Reply-To: References: <8057bb55df1ad875864054c6cfd9a273@ruby-forum.com> Message-ID: <20060801082238.GD26391@cordoba.webit.de> On Tue, Aug 01, 2006 at 02:09:36PM +0900, David Balmain wrote: [..] > The only thing staying the same is the field norms files. Everything > else is changing so it wouldn't be worth doing it in Java using any of > the existing Luke code. It'd have to be completely rewritten in Ruby. > > I haven't done any GUI stuff in ruby before so I'm not sure which > library would be best. If anyone has any recommendations I could > probably start something and then others could play around with it. I've started porting Luke to Ruby/Gtk a while ago. It's far from complete but I could make available what I have so far. But don't expect anything too fancy, I haven't done any Gui stuff with Ruby or Gtk before that, too ;-) Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Tue Aug 1 04:44:09 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 1 Aug 2006 17:44:09 +0900 Subject: [Ferret-talk] Per field boost values - possible? working? In-Reply-To: <20060801082238.GD26391@cordoba.webit.de> References: <8057bb55df1ad875864054c6cfd9a273@ruby-forum.com> <20060801082238.GD26391@cordoba.webit.de> Message-ID: On 8/1/06, Jens Kraemer wrote: > On Tue, Aug 01, 2006 at 02:09:36PM +0900, David Balmain wrote: > [..] > > The only thing staying the same is the field norms files. Everything > > else is changing so it wouldn't be worth doing it in Java using any of > > the existing Luke code. It'd have to be completely rewritten in Ruby. > > > > I haven't done any GUI stuff in ruby before so I'm not sure which > > library would be best. If anyone has any recommendations I could > > probably start something and then others could play around with it. > > I've started porting Luke to Ruby/Gtk a while ago. It's far from > complete but I could make available what I have so far. > > But don't expect anything too fancy, I haven't done any Gui stuff > with Ruby or Gtk before that, too ;-) Cool, I'd love to see it. From Pedro.CorteReal at iantt.pt Tue Aug 1 05:30:17 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Tue, 01 Aug 2006 10:30:17 +0100 Subject: [Ferret-talk] Sorting performance In-Reply-To: <20060731173617.GB19848@cordoba.webit.de> References: <1154340332.5397.2.camel@localhost.localdomain> <1154341604.5397.6.camel@localhost.localdomain> <1154358603.5397.11.camel@localhost.localdomain> <20060731173617.GB19848@cordoba.webit.de> Message-ID: <1154424617.757.2.camel@localhost.localdomain> On Mon, 2006-07-31 at 19:36 +0200, Jens Kraemer wrote: > > I forgot that I was actually supplying my own #to_doc so it was a matter > > of changing it to not tokenize the fields I want. When using > > acts_as_ferret the regular way I don't know if this is possible. > > it is, just provide a hash with the desired options to each field name: > > acts_as_ferret( > :fields => { > 'title' => { :boost => 2 }, > 'description' => { :boost => 1, > :index => Ferret::Document::Field::Index::UNTOKENIZED > } > }) > > options that can be set this way are (with their defaults given): > > :store => Ferret::Document::Field::Store::NO > :index => Ferret::Document::Field::Index::TOKENIZED > :term_vector => Ferret::Document::Field::TermVector::NO > :binary => false > :boost => 1.0 Cool. Didn't know about this. I started reading the code to understand how it worked but then remembered I was doing my own to_doc so I should just change that. I'll be sure to remember that for any future projects. By the way, does storing the TermVectors only increase the size of the index or does it alter performance in any way? Pedro. From Pedro.CorteReal at iantt.pt Tue Aug 1 05:36:04 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Tue, 01 Aug 2006 10:36:04 +0100 Subject: [Ferret-talk] Sorting performance In-Reply-To: References: <1154340332.5397.2.camel@localhost.localdomain> <1154358696.5397.14.camel@localhost.localdomain> Message-ID: <1154424965.757.7.camel@localhost.localdomain> On Tue, 2006-08-01 at 09:24 +0900, David Balmain wrote: > How many documents and what is the date range (eg 2001-01-01 -> > 2006-08-01). These are the critical variables for sort performance. > Once I know these numbers I'll be able to replicate the task here and > I'll see what I can do. I have around 600_000 documents and the date range is rather large, something like from year 1000 to now. I don't know for sure but I can check if it makes a difference. But not all my sort fields are dates. I also have regular text fields that I have now made untokenized (by using separate fields for sorting and searching). Got to check if that made them faster. > > Don't think I'm being critical, ferret is great software, many thanks > > for it. > > No offence taken. I'd definitely like to be able to help. I'm guessing > I'll probably have to optimize the C code to rectify this. That would be great, Thanks, Pedro. From dbalmain.ml at gmail.com Tue Aug 1 05:39:14 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 1 Aug 2006 18:39:14 +0900 Subject: [Ferret-talk] Sorting performance In-Reply-To: <1154424617.757.2.camel@localhost.localdomain> References: <1154340332.5397.2.camel@localhost.localdomain> <1154341604.5397.6.camel@localhost.localdomain> <1154358603.5397.11.camel@localhost.localdomain> <20060731173617.GB19848@cordoba.webit.de> <1154424617.757.2.camel@localhost.localdomain> Message-ID: On 8/1/06, Pedro C?rte-Real wrote: > On Mon, 2006-07-31 at 19:36 +0200, Jens Kraemer wrote: > > > I forgot that I was actually supplying my own #to_doc so it was a matter > > > of changing it to not tokenize the fields I want. When using > > > acts_as_ferret the regular way I don't know if this is possible. > > > > it is, just provide a hash with the desired options to each field name: > > > > acts_as_ferret( > > :fields => { > > 'title' => { :boost => 2 }, > > 'description' => { :boost => 1, > > :index => Ferret::Document::Field::Index::UNTOKENIZED > > } > > }) > > > > options that can be set this way are (with their defaults given): > > > > :store => Ferret::Document::Field::Store::NO > > :index => Ferret::Document::Field::Index::TOKENIZED > > :term_vector => Ferret::Document::Field::TermVector::NO > > :binary => false > > :boost => 1.0 > > By the way, does storing the TermVectors only increase the size of the > index or does it alter performance in any way? It increases the size of the index and affects indexing performance since a lot of extra data needs to be written and merged during the indexing process. Search performance won't be affected. Dave From Pedro.CorteReal at iantt.pt Tue Aug 1 05:52:01 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Tue, 01 Aug 2006 10:52:01 +0100 Subject: [Ferret-talk] Sorting performance In-Reply-To: References: <1154340332.5397.2.camel@localhost.localdomain> <1154341604.5397.6.camel@localhost.localdomain> <1154358603.5397.11.camel@localhost.localdomain> <20060731173617.GB19848@cordoba.webit.de> <1154424617.757.2.camel@localhost.localdomain> Message-ID: <1154425921.757.9.camel@localhost.localdomain> On Tue, 2006-08-01 at 18:39 +0900, David Balmain wrote: > > By the way, does storing the TermVectors only increase the size of the > > index or does it alter performance in any way? > > It increases the size of the index and affects indexing performance > since a lot of extra data needs to be written and merged during the > indexing process. Search performance won't be affected. Ah, but the default when creating a new field is already not to store it so I'm already doing it. Pedro. From dbalmain.ml at gmail.com Tue Aug 1 05:59:29 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 1 Aug 2006 18:59:29 +0900 Subject: [Ferret-talk] Sorting performance In-Reply-To: <1154424965.757.7.camel@localhost.localdomain> References: <1154340332.5397.2.camel@localhost.localdomain> <1154358696.5397.14.camel@localhost.localdomain> <1154424965.757.7.camel@localhost.localdomain> Message-ID: On 8/1/06, Pedro C?rte-Real wrote: > On Tue, 2006-08-01 at 09:24 +0900, David Balmain wrote: > > How many documents and what is the date range (eg 2001-01-01 -> > > 2006-08-01). These are the critical variables for sort performance. > > Once I know these numbers I'll be able to replicate the task here and > > I'll see what I can do. > > I have around 600_000 documents and the date range is rather large, > something like from year 1000 to now. I don't know for sure but I can > check if it makes a difference. > > But not all my sort fields are dates. I also have regular text fields > that I have now made untokenized (by using separate fields for sorting > and searching). Got to check if that made them faster. Hmmm. Sounds like an interesting application. One solution would be to cache the sort index on disk. The problem with this is that the cache would still need to be recalculated every time you add more documents to the index so you'll still have the long wait occasionally. I'll look into it anyway at a later stage. Another idea that I can implement now is to add a BYTES sort type which would basically sort by the order the terms appear in the index. Let's say you index dates in the format "YYYYMMDD" and you sort by INTEGER. Everytime you load the sort index you need to go through every single date and convert it from string to integer. But this is unnecessary since the dates are already in order in the index. A BYTES sort type would take advantage of this. You'd get an even bigger benefit for ascii strings. strcoll is used to sort strings but this is unnecessary for ascii strings as they are already correctly ordered in the index. Also, the index needs to keep each string in memory which would also be unnessary. Sorry if this isn't very clear. I'm not sure how much it will help. We'll have to wait and see. Dave From Pedro.CorteReal at iantt.pt Tue Aug 1 06:08:23 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Tue, 01 Aug 2006 11:08:23 +0100 Subject: [Ferret-talk] Sorting performance In-Reply-To: References: <1154340332.5397.2.camel@localhost.localdomain> <1154358696.5397.14.camel@localhost.localdomain> <1154424965.757.7.camel@localhost.localdomain> Message-ID: <1154426904.757.16.camel@localhost.localdomain> On Tue, 2006-08-01 at 18:59 +0900, David Balmain wrote: > Hmmm. Sounds like an interesting application. One solution would be to > cache the sort index on disk. The problem with this is that the cache > would still need to be recalculated every time you add more documents > to the index so you'll still have the long wait occasionally. I'll > look into it anyway at a later stage. For my application this wouldn't really be a problem since data is only loaded maybe once a week. But does the cache need to be recalculated completely? Database indexes work incrementally. > Another idea that I can implement now is to add a BYTES sort type > which would basically sort by the order the terms appear in the index. > Let's say you index dates in the format "YYYYMMDD" and you sort by > INTEGER. Everytime you load the sort index you need to go through > every single date and convert it from string to integer. But this is > unnecessary since the dates are already in order in the index. A BYTES > sort type would take advantage of this. For my date fields this would work. > You'd get an even bigger > benefit for ascii strings. strcoll is used to sort strings but this is > unnecessary for ascii strings as they are already correctly ordered in > the index. Also, the index needs to keep each string in memory which > would also be unnessary. One of my text order fields should have nothing but ASCII. The other is a title and can include arbitrary UTF-8, so I guess it wouldn't work for that one. > Sorry if this isn't very clear. I'm not sure how much it will help. > We'll have to wait and see. The BYTES ordering would probably speed it up but for my specific case, storing it on disk would be perfect. It would probably be a very good thing in case someone uses ferret to code command line tools that access a common index. Without storing the sorting on disk it will get recreated every time a command is ran. Pedro. From dbalmain.ml at gmail.com Tue Aug 1 06:32:50 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 1 Aug 2006 19:32:50 +0900 Subject: [Ferret-talk] Sorting performance In-Reply-To: <1154426904.757.16.camel@localhost.localdomain> References: <1154340332.5397.2.camel@localhost.localdomain> <1154358696.5397.14.camel@localhost.localdomain> <1154424965.757.7.camel@localhost.localdomain> <1154426904.757.16.camel@localhost.localdomain> Message-ID: On 8/1/06, Pedro C?rte-Real wrote: > On Tue, 2006-08-01 at 18:59 +0900, David Balmain wrote: > > Hmmm. Sounds like an interesting application. One solution would be to > > cache the sort index on disk. The problem with this is that the cache > > would still need to be recalculated every time you add more documents > > to the index so you'll still have the long wait occasionally. I'll > > look into it anyway at a later stage. > > For my application this wouldn't really be a problem since data is only > loaded maybe once a week. But does the cache need to be recalculated > completely? Database indexes work incrementally. Sure it's possible but it's a fair bit of work. Lucene doesn't have anything like this yet (not that that has stopped me adding features before). I'll think about it. Dave From Pedro.CorteReal at iantt.pt Tue Aug 1 07:29:33 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Tue, 01 Aug 2006 12:29:33 +0100 Subject: [Ferret-talk] Problem importing lots of records Message-ID: <1154431774.757.24.camel@localhost.localdomain> I run a script that imports a few thousand records into the database. The script runs once for each of several XML files. What it does is parse the XML and for each element of a certain type creates a record in a rails database that gets indexed with acts_as_ferret. This worked fine before but today after a few files (70000 records) this started to happen for any file I tried: ./imports/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:510:in `<<': : Error occured at :703 (StandardError) Error: exception 6 not handled: Could not obtain write lock when trying to write index from ./imports/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:510:in `ferret_create' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:333:in `callback' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:330:in `callback' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:262:in `create_without_timestamps' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/timestamp.rb:30:in `create' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/base.rb:1718:in `create_or_update_without_callbacks' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:249:in `create_or_update' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/base.rb:1392:in `save_without_validation' ... 24 levels... from /usr/lib/ruby/1.8/rexml/document.rb:173:in `parse_stream' from /home/pedrocr/xmlcodec/lib/stream_parser.rb:74:in `parse' from ./imports/import-ead:118 from ./imports/import-ead:115 The index is now in this state. There is no process running that could be holding a lock. Seems like some kind of race condition. I can provide the index (12MB) if it would help debug this. The only thing that I can see that might have changed behaviour is that I added some untokenized fields to sort by. Any ideas what it might be? Pedro. From dbalmain.ml at gmail.com Tue Aug 1 08:08:53 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 1 Aug 2006 21:08:53 +0900 Subject: [Ferret-talk] Problem importing lots of records In-Reply-To: <1154431774.757.24.camel@localhost.localdomain> References: <1154431774.757.24.camel@localhost.localdomain> Message-ID: On 8/1/06, Pedro C?rte-Real wrote: > I run a script that imports a few thousand records into the database. > The script runs once for each of several XML files. What it does is > parse the XML and for each element of a certain type creates a record in > a rails database that gets indexed with acts_as_ferret. This worked fine > before but today after a few files (70000 records) this started to > happen for any file I tried: > > ./imports/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:510:in `<<': : Error occured at :703 (StandardError) > Error: exception 6 not handled: Could not obtain write lock when trying > to write index > > from ./imports/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:510:in `ferret_create' > > from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:333:in `callback' > > from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:330:in `callback' > > from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:262:in `create_without_timestamps' > > from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/timestamp.rb:30:in `create' > > from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/base.rb:1718:in `create_or_update_without_callbacks' > > from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:249:in `create_or_update' > > from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/base.rb:1392:in `save_without_validation' > ... 24 levels... > from /usr/lib/ruby/1.8/rexml/document.rb:173:in `parse_stream' > from /home/pedrocr/xmlcodec/lib/stream_parser.rb:74:in `parse' > from ./imports/import-ead:118 > from ./imports/import-ead:115 > > > The index is now in this state. There is no process running that could > be holding a lock. Seems like some kind of race condition. I can provide > the index (12MB) if it would help debug this. > > The only thing that I can see that might have changed behaviour is that > I added some untokenized fields to sort by. > > Any ideas what it might be? This error occurs when there is a lock file open in the index. The first thing to do is make sure you don't have any Index, IndexWriter, IndexReader or Searcher open on the index. These are the classes that lock the index. If this is already the case you have a stray lock file which you can just delete. Look in your index for .lck files. They can be deleted. Also note that if this probably means that some of the data that appeared to be added correctly to the index may not have been. The index should still be valid otherwise. Dave From Pedro.CorteReal at iantt.pt Tue Aug 1 08:49:40 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Tue, 01 Aug 2006 13:49:40 +0100 Subject: [Ferret-talk] Problem importing lots of records In-Reply-To: References: <1154431774.757.24.camel@localhost.localdomain> Message-ID: <1154436580.757.28.camel@localhost.localdomain> On Tue, 2006-08-01 at 21:08 +0900, David Balmain wrote: > > The only thing that I can see that might have changed behaviour is that > > I added some untokenized fields to sort by. > > > > Any ideas what it might be? > > This error occurs when there is a lock file open in the index. The > first thing to do is make sure you don't have any Index, IndexWriter, > IndexReader or Searcher open on the index. These are the classes that > lock the index. The only thing accessing the index is acts_as_ferret and that should release it when ruby finishes, right? I'm not running it on a webserver, it's running in a script that does the same as the rails script/console. Could there be some race condition that makes it sometimes not finish? > If this is already the case you have a stray lock file > which you can just delete. Look in your index for .lck files. They can > be deleted. Also note that if this probably means that some of the > data that appeared to be added correctly to the index may not have > been. The index should still be valid otherwise. Have to try to understand what's happening then. It seems to be constantly reproducible. Pedro. From dbalmain.ml at gmail.com Tue Aug 1 09:22:03 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 1 Aug 2006 22:22:03 +0900 Subject: [Ferret-talk] Problem importing lots of records In-Reply-To: <1154436580.757.28.camel@localhost.localdomain> References: <1154431774.757.24.camel@localhost.localdomain> <1154436580.757.28.camel@localhost.localdomain> Message-ID: On 8/1/06, Pedro C?rte-Real wrote: > On Tue, 2006-08-01 at 21:08 +0900, David Balmain wrote: > > > The only thing that I can see that might have changed behaviour is that > > > I added some untokenized fields to sort by. > > > > > > Any ideas what it might be? > > > > This error occurs when there is a lock file open in the index. The > > first thing to do is make sure you don't have any Index, IndexWriter, > > IndexReader or Searcher open on the index. These are the classes that > > lock the index. > > The only thing accessing the index is acts_as_ferret and that should > release it when ruby finishes, right? I'm not running it on a webserver, > it's running in a script that does the same as the rails script/console. > Could there be some race condition that makes it sometimes not finish? No, I don't think so. The only thing I can think of is it could be a reference counting error in the Ruby-Ferret bindings. > > If this is already the case you have a stray lock file > > which you can just delete. Look in your index for .lck files. They can > > be deleted. Also note that if this probably means that some of the > > data that appeared to be added correctly to the index may not have > > been. The index should still be valid otherwise. > > Have to try to understand what's happening then. It seems to be > constantly reproducible. If you can distill the error to a smaller unit test or something that I can can reproduce here that would be great. It shouldn't take me long to find the problem. Cheers, Dave From Pedro.CorteReal at iantt.pt Tue Aug 1 10:02:30 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Tue, 01 Aug 2006 15:02:30 +0100 Subject: [Ferret-talk] Problem importing lots of records In-Reply-To: References: <1154431774.757.24.camel@localhost.localdomain> <1154436580.757.28.camel@localhost.localdomain> Message-ID: <1154440951.757.30.camel@localhost.localdomain> On Tue, 2006-08-01 at 22:22 +0900, David Balmain wrote: > > Have to try to understand what's happening then. It seems to be > > constantly reproducible. > > If you can distill the error to a smaller unit test or something that > I can can reproduce here that would be great. It shouldn't take me > long to find the problem. I'm trying to do that. The last time glibc threw a double free corruption warning. I'm trying to reproduce it more exactly. Pedro. From Pedro.CorteReal at iantt.pt Tue Aug 1 12:06:26 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Tue, 01 Aug 2006 17:06:26 +0100 Subject: [Ferret-talk] Problem importing lots of records In-Reply-To: <1154440951.757.30.camel@localhost.localdomain> References: <1154431774.757.24.camel@localhost.localdomain> <1154436580.757.28.camel@localhost.localdomain> <1154440951.757.30.camel@localhost.localdomain> Message-ID: <1154448386.757.32.camel@localhost.localdomain> On Tue, 2006-08-01 at 15:02 +0100, Pedro C?rte-Real wrote: > I'm trying to do that. The last time glibc threw a double free > corruption warning. I'm trying to reproduce it more exactly. Still haven't been able to do it completely but here's the exact error: *** glibc detected *** double free or corruption (!prev): 0x0847e5b8 *** Don't know if it's possible to go from that hex value to the code line. Probably not. Pedro. From Pedro.CorteReal at iantt.pt Tue Aug 1 12:14:48 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Tue, 01 Aug 2006 17:14:48 +0100 Subject: [Ferret-talk] Problem importing lots of records In-Reply-To: <1154448386.757.32.camel@localhost.localdomain> References: <1154431774.757.24.camel@localhost.localdomain> <1154436580.757.28.camel@localhost.localdomain> <1154440951.757.30.camel@localhost.localdomain> <1154448386.757.32.camel@localhost.localdomain> Message-ID: <1154448888.757.35.camel@localhost.localdomain> On Tue, 2006-08-01 at 17:06 +0100, Pedro C?rte-Real wrote: > On Tue, 2006-08-01 at 15:02 +0100, Pedro C?rte-Real wrote: > > I'm trying to do that. The last time glibc threw a double free > > corruption warning. I'm trying to reproduce it more exactly. > > Still haven't been able to do it completely but here's the exact error: > > *** glibc detected *** double free or corruption (!prev): 0x0847e5b8 *** > > Don't know if it's possible to go from that hex value to the code line. > Probably not. Here's a backtrace: #0 0xffffe410 in __kernel_vsyscall () #1 0xb7cf19a1 in raise () from /lib/tls/i686/cmov/libc.so.6 #2 0xb7cf32b9 in abort () from /lib/tls/i686/cmov/libc.so.6 #3 0xb7d2587a in __fsetlocking () from /lib/tls/i686/cmov/libc.so.6 #4 0xb7d2bfd4 in malloc_usable_size () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7d2c34a in free () from /lib/tls/i686/cmov/libc.so.6 #6 0xb73e8cae in tb_destroy (tb=0x0) at term.c:91 #7 0xb73e8ffc in te_destroy (te=0x49) at term.c:214 #8 0xb73e9345 in ste_close (te=0x87444c0) at term.c:326 #9 0xb73e1b2b in smi_destroy (smi=0x8a7b5f8) at index_rw.c:1984 #10 0xb73e3661 in sm_merge_term_infos (sm=0x8a66588) at index_rw.c:2282 #11 0xb73e3827 in sm_merge_terms (sm=0x8a66588) at index_rw.c:2314 #12 0xb73e3fc2 in sm_merge (sm=0x8a66588) at index_rw.c:2384 #13 0xb73e406c in iw_merge_segments_with_max (iw=0x8396400, min_segment=18, max_segment=20) at index_rw.c:829 #14 0xb73e42a7 in iw_merge_segments (iw=0x6, min_segment=0) at index_rw.c:860 #15 0xb73e4311 in iw_flush_ram_segments (iw=0x8396400) at index_rw.c:918 #16 0xb73e4376 in iw_close (iw=0x8396400) at index_rw.c:965 #17 0xb73f6c2b in index_add_doc (self=0x8ad5278, doc=0x86e63d8) at ind.c:218 #18 0xb73d4a09 in frt_ind_add_doc (argc=1, argv=0x0, self=3075201096) at r_search.c:1900 #19 0xb7e7e5d3 in rb_iterator_p () from /usr/lib/libruby1.8.so.1.8 #20 0xb7e89089 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #21 0xb7e89aef in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #22 0xb7e8675b in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 I seem to be able to consistently generate this. What else should I do? Pedro. From dbalmain.ml at gmail.com Tue Aug 1 19:41:33 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 2 Aug 2006 08:41:33 +0900 Subject: [Ferret-talk] Problem importing lots of records In-Reply-To: <1154448888.757.35.camel@localhost.localdomain> References: <1154431774.757.24.camel@localhost.localdomain> <1154436580.757.28.camel@localhost.localdomain> <1154440951.757.30.camel@localhost.localdomain> <1154448386.757.32.camel@localhost.localdomain> <1154448888.757.35.camel@localhost.localdomain> Message-ID: On 8/2/06, Pedro C?rte-Real wrote: > On Tue, 2006-08-01 at 17:06 +0100, Pedro C?rte-Real wrote: > > On Tue, 2006-08-01 at 15:02 +0100, Pedro C?rte-Real wrote: > > > I'm trying to do that. The last time glibc threw a double free > > > corruption warning. I'm trying to reproduce it more exactly. > > > > Still haven't been able to do it completely but here's the exact error: > > > > *** glibc detected *** double free or corruption (!prev): 0x0847e5b8 *** > > > > Don't know if it's possible to go from that hex value to the code line. > > Probably not. > > Here's a backtrace: > > #0 0xffffe410 in __kernel_vsyscall () > #1 0xb7cf19a1 in raise () from /lib/tls/i686/cmov/libc.so.6 > #2 0xb7cf32b9 in abort () from /lib/tls/i686/cmov/libc.so.6 > #3 0xb7d2587a in __fsetlocking () from /lib/tls/i686/cmov/libc.so.6 > #4 0xb7d2bfd4 in malloc_usable_size () from /lib/tls/i686/cmov/libc.so.6 > #5 0xb7d2c34a in free () from /lib/tls/i686/cmov/libc.so.6 > #6 0xb73e8cae in tb_destroy (tb=0x0) at term.c:91 > #7 0xb73e8ffc in te_destroy (te=0x49) at term.c:214 > #8 0xb73e9345 in ste_close (te=0x87444c0) at term.c:326 > #9 0xb73e1b2b in smi_destroy (smi=0x8a7b5f8) at index_rw.c:1984 > #10 0xb73e3661 in sm_merge_term_infos (sm=0x8a66588) at index_rw.c:2282 > #11 0xb73e3827 in sm_merge_terms (sm=0x8a66588) at index_rw.c:2314 > #12 0xb73e3fc2 in sm_merge (sm=0x8a66588) at index_rw.c:2384 > #13 0xb73e406c in iw_merge_segments_with_max (iw=0x8396400, min_segment=18, > max_segment=20) at index_rw.c:829 > #14 0xb73e42a7 in iw_merge_segments (iw=0x6, min_segment=0) at index_rw.c:860 > #15 0xb73e4311 in iw_flush_ram_segments (iw=0x8396400) at index_rw.c:918 > #16 0xb73e4376 in iw_close (iw=0x8396400) at index_rw.c:965 > #17 0xb73f6c2b in index_add_doc (self=0x8ad5278, doc=0x86e63d8) at ind.c:218 > #18 0xb73d4a09 in frt_ind_add_doc (argc=1, argv=0x0, self=3075201096) > at r_search.c:1900 > #19 0xb7e7e5d3 in rb_iterator_p () from /usr/lib/libruby1.8.so.1.8 > #20 0xb7e89089 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 > #21 0xb7e89aef in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 > #22 0xb7e8675b in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 > > I seem to be able to consistently generate this. What else should I do? This would usually be enough but there seems to be some weird stuff happening there. For example, argv = NULL on line #18 which doesn't seem possible to me. On line #14 iw gets a strange value but then it gets restored on line #13. Anyway, I think the error is occurring higher up on the stack so maybe it doesn't matter. I'll keep trying to work out what is going on but it'd be a lot easier if you could give me something I could run here to reproduce the error. Cheers, Dave From dbalmain.ml at gmail.com Tue Aug 1 22:13:38 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 2 Aug 2006 11:13:38 +0900 Subject: [Ferret-talk] Sorting performance In-Reply-To: <1154426904.757.16.camel@localhost.localdomain> References: <1154340332.5397.2.camel@localhost.localdomain> <1154358696.5397.14.camel@localhost.localdomain> <1154424965.757.7.camel@localhost.localdomain> <1154426904.757.16.camel@localhost.localdomain> Message-ID: On 8/1/06, Pedro C?rte-Real wrote: > On Tue, 2006-08-01 at 18:59 +0900, David Balmain wrote: > > Hmmm. Sounds like an interesting application. One solution would be to > > cache the sort index on disk. The problem with this is that the cache > > would still need to be recalculated every time you add more documents > > to the index so you'll still have the long wait occasionally. I'll > > look into it anyway at a later stage. > > For my application this wouldn't really be a problem since data is only > loaded maybe once a week. But does the cache need to be recalculated > completely? Database indexes work incrementally. Have you tried optimizing your index? I found an order of magnitude difference in speed here with an optimized index. Even with 1,000,000 unique documents though sorting is taking less than 10 seconds for an unoptimized index and less than 1 second for optimized index. What kind of system are you running on? Dave From Pedro.CorteReal at iantt.pt Wed Aug 2 05:13:14 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Wed, 02 Aug 2006 10:13:14 +0100 Subject: [Ferret-talk] Sorting performance In-Reply-To: References: <1154340332.5397.2.camel@localhost.localdomain> <1154358696.5397.14.camel@localhost.localdomain> <1154424965.757.7.camel@localhost.localdomain> <1154426904.757.16.camel@localhost.localdomain> Message-ID: <1154509995.757.37.camel@localhost.localdomain> On Wed, 2006-08-02 at 11:13 +0900, David Balmain wrote: > On 8/1/06, Pedro C?rte-Real wrote: > > On Tue, 2006-08-01 at 18:59 +0900, David Balmain wrote: > > > Hmmm. Sounds like an interesting application. One solution would be to > > > cache the sort index on disk. The problem with this is that the cache > > > would still need to be recalculated every time you add more documents > > > to the index so you'll still have the long wait occasionally. I'll > > > look into it anyway at a later stage. > > > > For my application this wouldn't really be a problem since data is only > > loaded maybe once a week. But does the cache need to be recalculated > > completely? Database indexes work incrementally. > > Have you tried optimizing your index? I found an order of magnitude > difference in speed here with an optimized index. Even with 1,000,000 > unique documents though sorting is taking less than 10 seconds for an > unoptimized index and less than 1 second for optimized index. What > kind of system are you running on? I was guessing acts_as_ferret did that. But apparently only on rebuild_index. I'll try adding an optimize call at the start of the app. I'm running this on a 2.66 GHz Celeron with 1GB ram. Pedro. From wintonius at gmail.com Wed Aug 2 05:14:41 2006 From: wintonius at gmail.com (Winton) Date: Wed, 2 Aug 2006 11:14:41 +0200 Subject: [Ferret-talk] Model still using mysql Message-ID: <6950df95a26a35f6ec42cc76fa3337fd@ruby-forum.com> When calling a Result.find_by_content on my model, I get the following error: Mysql::Error: Table 'db.results' doesn't exist: SHOW FIELDS FROM results Yes, I remembered to put acts_as_ferret. Any ideas? Trace: /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/connection_adapters/abstract_adapter.rb:120:in `log' /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/connection_adapters/mysql_adapter.rb:185:in `execute' /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/connection_adapters/mysql_adapter.rb:293:in `columns' /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/base.rb:696:in `columns' /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/calculations.rb:213:in `column_for' /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/calculations.rb:135:in `calculate' /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/calculations.rb:64:in `count' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:249:in `rebuild_index' -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Aug 2 05:20:51 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 2 Aug 2006 11:20:51 +0200 Subject: [Ferret-talk] Model still using mysql In-Reply-To: <6950df95a26a35f6ec42cc76fa3337fd@ruby-forum.com> References: <6950df95a26a35f6ec42cc76fa3337fd@ruby-forum.com> Message-ID: <20060802092051.GI26391@cordoba.webit.de> On Wed, Aug 02, 2006 at 11:14:41AM +0200, Winton wrote: > When calling a Result.find_by_content on my model, I get the following > error: > > Mysql::Error: Table 'db.results' doesn't exist: SHOW FIELDS FROM results > > Yes, I remembered to put acts_as_ferret. Any ideas? in fact, acts_as_ferret depends on your model being an ActiveRecord::Base child class represented by a database table. For example, it may need to look up what fields your model has. Besides that, in your case the index doesn't seem to exist, so aaf tries to build one from the Result records in the database - which fails. Jens > > > Trace: > /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/connection_adapters/abstract_adapter.rb:120:in > `log' > /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/connection_adapters/mysql_adapter.rb:185:in > `execute' > /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/connection_adapters/mysql_adapter.rb:293:in > `columns' > /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/base.rb:696:in > `columns' > /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/calculations.rb:213:in > `column_for' > /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/calculations.rb:135:in > `calculate' > /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/calculations.rb:64:in > `count' > #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:249:in > `rebuild_index' > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From JanPrill at blauton.de Wed Aug 2 05:23:08 2006 From: JanPrill at blauton.de (Jan Prill) Date: Wed, 2 Aug 2006 11:23:08 +0200 Subject: [Ferret-talk] Model still using mysql In-Reply-To: <6950df95a26a35f6ec42cc76fa3337fd@ruby-forum.com> References: <6950df95a26a35f6ec42cc76fa3337fd@ruby-forum.com> Message-ID: <562a35c10608020223h1ff3ea6fr3a793eeca793e75f@mail.gmail.com> Hi Winton, that's right. find_by_content finds a ferret document. From that document the mysql-id gets extracted and then ActiveRecord gets the model from the db. Using ferret fulltext-searches are a magnitude faster than searching on an innodb with LIKE if there are many records on your db and you'll get the ferret query language for your convenience. If you don't want to rely on the db at all you need to work with the ferret documents directly. That's possible as well. Cheers, Jan On 8/2/06, Winton wrote: > > When calling a Result.find_by_content on my model, I get the following > error: > > Mysql::Error: Table 'db.results' doesn't exist: SHOW FIELDS FROM results > > Yes, I remembered to put acts_as_ferret. Any ideas? > > > > Trace: > > /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord- > 1.14.2/lib/active_record/connection_adapters/abstract_adapter.rb:120:in > `log' > > /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord- > 1.14.2/lib/active_record/connection_adapters/mysql_adapter.rb:185:in > `execute' > > /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord- > 1.14.2/lib/active_record/connection_adapters/mysql_adapter.rb:293:in > `columns' > > /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord- > 1.14.2/lib/active_record/base.rb:696:in > `columns' > > /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord- > 1.14.2/lib/active_record/calculations.rb:213:in > `column_for' > > /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord- > 1.14.2/lib/active_record/calculations.rb:135:in > `calculate' > > /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/activerecord- > 1.14.2/lib/active_record/calculations.rb:64:in > `count' > #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:249:in > `rebuild_index' > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060802/54fa91cb/attachment-0001.html From bk at benjaminkrause.com Wed Aug 2 05:40:28 2006 From: bk at benjaminkrause.com (Benjamin Krause) Date: Wed, 02 Aug 2006 11:40:28 +0200 Subject: [Ferret-talk] too many clauses exception Message-ID: <44D0730C.7040007@benjaminkrause.com> hey.. i get this error quite regularly, what exactly dies it mean? : Error occured at :54 Error: exception 6 not handled: Too many clauses am i adding to many clauses in the query statement? Ben From kraemer at webit.de Wed Aug 2 06:06:18 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 2 Aug 2006 12:06:18 +0200 Subject: [Ferret-talk] too many clauses exception In-Reply-To: <44D0730C.7040007@benjaminkrause.com> References: <44D0730C.7040007@benjaminkrause.com> Message-ID: <20060802100618.GJ26391@cordoba.webit.de> On Wed, Aug 02, 2006 at 11:40:28AM +0200, Benjamin Krause wrote: > hey.. > > i get this error quite regularly, what exactly dies it mean? > > : Error occured at :54 > Error: exception 6 not handled: Too many clauses > > am i adding to many clauses in the query statement? this can happen with wild card queries, which can result in lots of Boolean clauses, i.e. searching for A* will give a booleanquery having a clause for every term in your index that starts with 'A'. You can override the default limit of 1024 by setting the max_clause_count variable in the boolean query in question (difficult if the query is generated by the query parser, in that case you could override BooleanQuery#max_clause_count to return a higher value). Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From Pedro.CorteReal at iantt.pt Wed Aug 2 06:23:59 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Wed, 02 Aug 2006 11:23:59 +0100 Subject: [Ferret-talk] Problem importing lots of records In-Reply-To: References: <1154431774.757.24.camel@localhost.localdomain> <1154436580.757.28.camel@localhost.localdomain> <1154440951.757.30.camel@localhost.localdomain> <1154448386.757.32.camel@localhost.localdomain> <1154448888.757.35.camel@localhost.localdomain> Message-ID: <1154514239.757.43.camel@localhost.localdomain> On Wed, 2006-08-02 at 08:41 +0900, David Balmain wrote: > This would usually be enough but there seems to be some weird stuff > happening there. For example, argv = NULL on line #18 which doesn't > seem possible to me. On line #14 iw gets a strange value but then it > gets restored on line #13. Anyway, I think the error is occurring > higher up on the stack so maybe it doesn't matter. I'll keep trying to > work out what is going on but it'd be a lot easier if you could give > me something I could run here to reproduce the error. I have a standalone test case that blows up predictably every time. Since it involves some internal data I have to clear it with the powers that be. Shouldn't be a problem. As soon as that's done I'll mail it privately to you. Shouldn't take too long. Meanwhile here's a different backtrace: #0 0xffffe410 in __kernel_vsyscall () #1 0xb7d469a1 in raise () from /lib/tls/i686/cmov/libc.so.6 #2 0xb7d482b9 in abort () from /lib/tls/i686/cmov/libc.so.6 #3 0xb7d7a87a in __fsetlocking () from /lib/tls/i686/cmov/libc.so.6 #4 0xb7d80fd4 in malloc_usable_size () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7d8134a in free () from /lib/tls/i686/cmov/libc.so.6 #6 0xb7b8dcae in tb_destroy (tb=0x0) at term.c:91 #7 0xb7b8dffc in te_destroy (te=0x49) at term.c:214 #8 0xb7b8e345 in ste_close (te=0x821b978) at term.c:326 #9 0xb7b86b2b in smi_destroy (smi=0x823e670) at index_rw.c:1984 #10 0xb7b88661 in sm_merge_term_infos (sm=0x80d2ba0) at index_rw.c:2282 #11 0xb7b88827 in sm_merge_terms (sm=0x80d2ba0) at index_rw.c:2314 #12 0xb7b88fc2 in sm_merge (sm=0x80d2ba0) at index_rw.c:2384 #13 0xb7b8906c in iw_merge_segments_with_max (iw=0x80a8450, min_segment=12, max_segment=17) at index_rw.c:829 #14 0xb7b892a7 in iw_merge_segments (iw=0x6, min_segment=0) at index_rw.c:860 #15 0xb7b89311 in iw_flush_ram_segments (iw=0x80a8450) at index_rw.c:918 #16 0xb7b89376 in iw_close (iw=0x80a8450) at index_rw.c:965 #17 0xb7b9ab93 in index_destroy (self=0x80fd6c0) at ind.c:90 #18 0xb7b7679b in frt_ind_free (p=0x80fd6c0) at r_search.c:1677 #19 0xb7ef08de in rb_gc_call_finalizer_at_exit () from /usr/lib/libruby1.8.so.1.8 #20 0xb7ed28be in is_ruby_native_thread () from /usr/lib/libruby1.8.so.1.8 #21 0xb7eea0d3 in ruby_cleanup () from /usr/lib/libruby1.8.so.1.8 #22 0xb7eea1b3 in ruby_stop () from /usr/lib/libruby1.8.so.1.8 #23 0xb7eea84e in ruby_run () from /usr/lib/libruby1.8.so.1.8 #24 0x080485dc in main () Looks much cleaner since it's actually a full one while the other was full of ruby's thread handling that I truncated. Probably because it was running inside a full rails environment. This time I ripped everything out and it's just ferret calls. Pedro. From wintonius at gmail.com Wed Aug 2 19:54:57 2006 From: wintonius at gmail.com (Winton) Date: Thu, 3 Aug 2006 01:54:57 +0200 Subject: [Ferret-talk] Model still using mysql In-Reply-To: <6950df95a26a35f6ec42cc76fa3337fd@ruby-forum.com> References: <6950df95a26a35f6ec42cc76fa3337fd@ruby-forum.com> Message-ID: Ah yes, I failed to read that it updates ActiveRecord also. How do I turn ActiveRecord off? -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Wed Aug 2 22:02:53 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 3 Aug 2006 11:02:53 +0900 Subject: [Ferret-talk] too many clauses exception In-Reply-To: <20060802100618.GJ26391@cordoba.webit.de> References: <44D0730C.7040007@benjaminkrause.com> <20060802100618.GJ26391@cordoba.webit.de> Message-ID: On 8/2/06, Jens Kraemer wrote: > On Wed, Aug 02, 2006 at 11:40:28AM +0200, Benjamin Krause wrote: > > hey.. > > > > i get this error quite regularly, what exactly dies it mean? > > > > : Error occured at :54 > > Error: exception 6 not handled: Too many clauses > > > > am i adding to many clauses in the query statement? > > this can happen with wild card queries, which can result in > lots of Boolean clauses, i.e. searching for A* will give a booleanquery > having a clause for every term in your index that starts with 'A'. > > You can override the default limit of 1024 by setting the > max_clause_count variable in the boolean query in question (difficult if > the query is generated by the query parser, in that case you could > override BooleanQuery#max_clause_count to return a higher value). > > Jens In future these queries will be rewritten to MultiTermQueries which have much better performance than BooleanQueries. You still have the same problem though. You'll need to set the max_term_count. There will be a :max_term_count parameter on the QueryParser also. Dave From fraiha at charter.net Wed Aug 2 23:47:09 2006 From: fraiha at charter.net (Adrian Fraiha) Date: Thu, 3 Aug 2006 05:47:09 +0200 Subject: [Ferret-talk] Grouping results. Message-ID: What would be the best away to go about group results. For example: two children contain the word "test" how would I get it to just display its Parent once? Thanks for the help! -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Aug 3 05:17:19 2006 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 3 Aug 2006 11:17:19 +0200 Subject: [Ferret-talk] Model still using mysql In-Reply-To: References: <6950df95a26a35f6ec42cc76fa3337fd@ruby-forum.com> Message-ID: <20060803091719.GA21809@cordoba.webit.de> Hi! On Thu, Aug 03, 2006 at 01:54:57AM +0200, Winton wrote: > Ah yes, I failed to read that it updates ActiveRecord also. > > How do I turn ActiveRecord off? if you don't want to use activerecord, acts_as_ferret won't be of much use to you. aaf is all about synchronizing some AR model with a ferret index. That is, it's tied to database create/update operations and will update the ferret index according to the changes made to the database. If your model isn't backed by a database, you should go and use Ferret directly - just as described at http://ferret.davebalmain.com/api/files/TUTORIAL.html Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From jordan.w.frank at gmail.com Thu Aug 3 10:53:35 2006 From: jordan.w.frank at gmail.com (Jordan Frank) Date: Thu, 3 Aug 2006 10:53:35 -0400 Subject: [Ferret-talk] Index.optimize Message-ID: In the documentation, it says that optimize "should only be called when the index will no longer be updated very often, but will be read a lot". Does this mean it actually has a detrimental impact on updates and inserts? In my project there will be many more reads than updates, but there will still be a lot of updates. So should I be calling Optimize once a day or something like that, during a low traffic time, or is that going to make updates slower? What are your recommendations for optimizing the index for a high-traffic site? Does optimizing the index lock the index for a fair period of time if there are 100s of thousands of documents? Will read requests be denied during the optimization process? -- Cheers, Jordan Frank jordan.w.frank at gmail.com From Pedro.CorteReal at iantt.pt Thu Aug 3 11:24:17 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Thu, 03 Aug 2006 16:24:17 +0100 Subject: [Ferret-talk] Sorting performance In-Reply-To: References: <1154340332.5397.2.camel@localhost.localdomain> <1154358696.5397.14.camel@localhost.localdomain> <1154424965.757.7.camel@localhost.localdomain> <1154426904.757.16.camel@localhost.localdomain> Message-ID: <1154618657.757.62.camel@localhost.localdomain> On Wed, 2006-08-02 at 11:13 +0900, David Balmain wrote: > Have you tried optimizing your index? I found an order of magnitude > difference in speed here with an optimized index. Even with 1,000,000 > unique documents though sorting is taking less than 10 seconds for an > unoptimized index and less than 1 second for optimized index. What > kind of system are you running on? You were right. I benchmarked it at about 10x faster to preload the indexes, even counting the time to run #optimize. Thanks for the tip. Pedro. From dbalmain.ml at gmail.com Thu Aug 3 12:43:54 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 4 Aug 2006 01:43:54 +0900 Subject: [Ferret-talk] Index.optimize In-Reply-To: References: Message-ID: On 8/3/06, Jordan Frank wrote: > In the documentation, it says that optimize "should only be called > when the index will no longer be updated very often, but will be read > a lot". Does this mean it actually has a detrimental impact on updates > and inserts? Hi Jordan, Optimizing the index won't effect inserts and will actually make updates a little faster. The problem is the time it takes to optimize the index. Optimization is an expensive process so it should be avoided when doing a lot of indexing. > In my project there will be many more reads than updates, > but there will still be a lot of updates. So should I be calling > Optimize once a day or something like that, during a low traffic time, > or is that going to make updates slower? The only detrimental performance impact optimizing has is the time it takes to actually perform the optimization. > What are your recommendations > for optimizing the index for a high-traffic site? Does optimizing the > index lock the index for a fair period of time if there are 100s of > thousands of documents? Will read requests be denied during the > optimization process? Yes, optimization can hold the lock for a long period of time, but an IndexWriter has a write-lock open on the index the whole time it is open. The IndexWriter won't effect the performance of IndexReaders. You can have as many processes as you like reading the index at the same time and it won't matter how you are writing to the index. You should keep in mind that the IndexReaders need to be refreshed (closed and opened again) to make use of latest data added to the index. The Index class can be setup to handle this for you but if you are concerned about performance you should stick to using the IndexWriter and IndexReader classes or at least understand how the Index class makes use of those two classes (check out the Ruby source for the Index class). So the basic answer to your question is; optimize whenever you have the cpu cycles available to perform the opimization process. Optimizing the index won't negatively effect the performance of the index in any way. Please let me know if I haven't been clear on anything. Cheers, Dave From jordan.w.frank at gmail.com Thu Aug 3 15:55:04 2006 From: jordan.w.frank at gmail.com (Jordan Frank) Date: Thu, 3 Aug 2006 15:55:04 -0400 Subject: [Ferret-talk] Index.optimize In-Reply-To: References: Message-ID: On 8/3/06, David Balmain wrote: > On 8/3/06, Jordan Frank wrote: > > In the documentation, it says that optimize "should only be called > > when the index will no longer be updated very often, but will be read > > a lot". Does this mean it actually has a detrimental impact on updates > > and inserts? > > Hi Jordan, > > Optimizing the index won't effect inserts and will actually make > updates a little faster. The problem is the time it takes to optimize > the index. Optimization is an expensive process so it should be > avoided when doing a lot of indexing. > > > In my project there will be many more reads than updates, > > but there will still be a lot of updates. So should I be calling > > Optimize once a day or something like that, during a low traffic time, > > or is that going to make updates slower? > > The only detrimental performance impact optimizing has is the time it > takes to actually perform the optimization. > > > What are your recommendations > > for optimizing the index for a high-traffic site? Does optimizing the > > index lock the index for a fair period of time if there are 100s of > > thousands of documents? Will read requests be denied during the > > optimization process? > > Yes, optimization can hold the lock for a long period of time, but an > IndexWriter has a write-lock open on the index the whole time it is > open. The IndexWriter won't effect the performance of IndexReaders. > You can have as many processes as you like reading the index at the > same time and it won't matter how you are writing to the index. You > should keep in mind that the IndexReaders need to be refreshed (closed > and opened again) to make use of latest data added to the index. The > Index class can be setup to handle this for you but if you are > concerned about performance you should stick to using the IndexWriter > and IndexReader classes or at least understand how the Index class > makes use of those two classes (check out the Ruby source for the > Index class). > > So the basic answer to your question is; optimize whenever you have > the cpu cycles available to perform the opimization process. > Optimizing the index won't negatively effect the performance of the > index in any way. > > Please let me know if I haven't been clear on anything. > > Cheers, > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > Crystal clear, thank you very much. I definitely intend on getting more into the internals of Ferret to understand how it really works, but for now I need to get something going quickly, so your answer is just what I needed. Right now I'm really just using acts_as_ferret but modifying it slightly so that the updates get queued and then all happen periodically, instead of when a record is updated...Eventually I would like to bypass acts_as_ferret altogether (not that it isn't a great tool, I just want to have more control) and use Ferret directly. Thanks for your help. -- Cheers, Jordan Frank jordan.w.frank at gmail.com From samuelgiffney at gmail.com Thu Aug 3 23:03:24 2006 From: samuelgiffney at gmail.com (Sam Giffney) Date: Fri, 4 Aug 2006 05:03:24 +0200 Subject: [Ferret-talk] Mongrel Cluster Compatibility Message-ID: Is anyone using ferret with Mongrel/Mongrel-cluster? The first one or two times I access the ferret index it works fine, but then it throws a write lock error StandardError (: Error occured at :703 Error: exception 6 not handled: Could not obtain write lock when trying to write index ): I need to do more testing on this to narrow down the problem/solution but just wanted to throw out the question to see if anyone was using this already succesfully or if someone (Dave? :) knew why ferret might choke on this setup. I've been developing using lighttpd as a server without any problems -Debian sarge built like http://brainspl.at/articles/2005/11/13/the-perfect-lightweight-rails-lighttpd-debian-install but I've just started testing on a production server -Debian sarge with with Apache 2.2, mod_proxy_balancer and Mongrel built like http://forums.rimuhosting.com/forums/showthread.php?t=230 and http://blog.innerewut.de/articles/2006/04/21/scaling-rails-with-apache-2-2-mod_proxy_balancer-and-mongrel (I put in the links in case anyone else is interested in deploying their own rails server - these links have been gold to me) Sam -- Posted via http://www.ruby-forum.com/. From samuelgiffney at gmail.com Thu Aug 3 23:06:21 2006 From: samuelgiffney at gmail.com (Sam Giffney) Date: Fri, 4 Aug 2006 15:06:21 +1200 Subject: [Ferret-talk] Mongrel Cluster Compatibility Message-ID: Is anyone using ferret with Mongrel/Mongrel-cluster? The first one or two times I access the ferret index it works fine, but then it throws a write lock error StandardError (: Error occured at :703 Error: exception 6 not handled: Could not obtain write lock when trying to write index ): I need to do more testing on this to narrow down the problem/solution but just wanted to throw out the question to see if anyone was using this already succesfully or if someone (Dave? :) knew why ferret might choke on this setup. I've been developing using lighttpd as a server without any problems -Debian sarge built like http://brainspl.at/articles/2005/11/13/the-perfect-lightweight-rails-lighttpd-debian-install but I've just started testing on a production server -Debian sarge with with Apache 2.2, mod_proxy_balancer and Mongrel built like http://forums.rimuhosting.com/forums/showthread.php?t=230 and http://blog.innerewut.de/articles/2006/04/21/scaling-rails-with-apache-2-2-mod_proxy_balancer-and-mongrel (I put in the links in case anyone else is interested in deploying their own rails server - these links have been gold to me) Sam From shad at liquidcultures.com Fri Aug 4 05:11:34 2006 From: shad at liquidcultures.com (Shad Reynolds) Date: Fri, 4 Aug 2006 11:11:34 +0200 Subject: [Ferret-talk] incorrect checksum for freed object? Message-ID: I'm using ferret (0.9.4) in rails, but outside of the "acts_as_ferret" plugin. Whenever I use a QueryFilter (even a very simple one), the server will crash after one, two, or three reloads of a page (same page, same query, same filter). It's very non-deterministic and I can't seem to reproduce it outside of my application environment (I can't get it to fail in my unit tests). The error always occurs AFTER my controller method returns (not when I make the calls). Also, the error does not always look the same. Sometimes it ends in a Bus Error, other times a Segmentation Fault, but always "incorrect checksum for freed object". This started when I migrated to 0.9.4 from 0.3.2 Thoughts? Thanks, Shad ---- ruby(1895,0xa000ed98) malloc: *** error for object 0x266b1f0: incorrect checksum for freed object - object was probably modified after being freed, break at szone_error to debug ruby(1895,0xa000ed98) malloc: *** set a breakpoint in szone_error to debug ruby(1895,0xa000ed98) malloc: *** error for object 0x266ace0: incorrect checksum for freed object - object was probably modified after being freed, break at szone_error to debug ruby(1895,0xa000ed98) malloc: *** set a breakpoint in szone_error to debug /usr/lib/ruby/1.8/timeout.rb:40: [BUG] Segmentation fault ruby 1.8.2 (2004-12-25) [powerpc-darwin8.0] Abort trap ----- ruby(1883,0xa000ed98) malloc: *** error for object 0x2198160: incorrect checksum for freed object - object was probably modified after being freed, break at szone_error to debug ruby(1883,0xa000ed98) malloc: *** set a breakpoint in szone_error to debug /usr/lib/ruby/gems/1.8/gems/rails-1.1.2/lib/webrick_server.rb:148: [BUG] Bus Error ruby 1.8.2 (2004-12-25) [powerpc-darwin8.0] Abort trap ---- -- http://www.ShadReynolds.com http://www.flickr.com/photos/shadreynolds/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060804/a94b43a0/attachment.html From dbalmain.ml at gmail.com Fri Aug 4 05:27:03 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 4 Aug 2006 18:27:03 +0900 Subject: [Ferret-talk] Mongrel Cluster Compatibility In-Reply-To: References: Message-ID: On 8/4/06, Sam Giffney wrote: > Is anyone using ferret with Mongrel/Mongrel-cluster? > > The first one or two times I access the ferret index it works fine, > but then it throws a write lock error > StandardError (: Error occured at :703 > Error: exception 6 not handled: Could not obtain write lock when > trying to write index > ): > > I need to do more testing on this to narrow down the problem/solution > but just wanted to throw out the question to see if anyone was using > this already succesfully or if someone (Dave? :) knew why ferret might > choke on this setup. > > I've been developing using lighttpd as a server without any problems > -Debian sarge built like > http://brainspl.at/articles/2005/11/13/the-perfect-lightweight-rails-lighttpd-debian-install > > but I've just started testing on a production server > -Debian sarge with with Apache 2.2, mod_proxy_balancer and Mongrel built like > http://forums.rimuhosting.com/forums/showthread.php?t=230 > and > http://blog.innerewut.de/articles/2006/04/21/scaling-rails-with-apache-2-2-mod_proxy_balancer-and-mongrel > > (I put in the links in case anyone else is interested in deploying > their own rails server - these links have been gold to me) Hi Sam, I haven't looked over the links so I'm not sure of the exact difference between your dev setup and your production setup. However, I can explain why you are probably getting locking errors. Ferret only allows one process to be writing to an index at a time. Once you have an IndexWriter open on an index it will obtain a write lock on the index and you won't be able to open another IndexWriter or delete documents with and IndexReader. Solution? You can kind of get around this problem by using the Index class and setting the :auto_flush parameter to true. If you are concerned about performance though you are better off with just the one process writing to the index. Does that make sense? Cheers, Dave From shad at liquidcultures.com Fri Aug 4 09:10:44 2006 From: shad at liquidcultures.com (Shad Reynolds) Date: Fri, 4 Aug 2006 15:10:44 +0200 Subject: [Ferret-talk] incorrect checksum for freed object? In-Reply-To: References: Message-ID: I've just done some more testing and I'm getting odd behaviour. It appears there's some sort of odd race condition. The following code runs on my machine perfectly (powerbook g4): ---- def test_index options = {:dir=>Ferret::Store::FSDirectory.new( $index_dir, false ), :auto_flush=>true} idx = Ferret::Index::Index.new( options ) begin query = Ferret::Search::BooleanQuery.new query.add_query Ferret::Search::TermQuery.new( Ferret::Index::Term.new( "name", "type") ) filter = Ferret::Search::QueryFilter.new( query ) idx.search_each( "name:type", :filter=>filter ) do |doc,score| puts idx[doc]["name"] end ensure idx.close end end ---- BUT, If I insert an extra line most anywhere (just before the last "end" for instance) then I get the following error: ---- ruby(2776) malloc: *** error for object 0x2679330: incorrect checksum for freed object - object was probably modified after being freed, break at szone_error to debug ruby(2776) malloc: *** set a breakpoint in szone_error to debug ---- I have not modified the logic of this test case AT ALL, I only added one extra line. Also, this is reproducable. - add extra line: fail, fail, fail - remove line: works, works, works - add extra line: fail, fail etc. Not sure what to do at this point, so I will most likely roll back to the 0.3.2 version I was using before. Thanks, Shad On 8/4/06, Shad Reynolds wrote: > > I'm using ferret (0.9.4) in rails, but outside of the "acts_as_ferret" plugin. Whenever I use a QueryFilter (even a very simple one), the server will crash after one, two, or three reloads of a page (same page, same query, same filter). It's very non-deterministic and I can't seem to reproduce it outside of my application environment (I can't get it to fail in my unit tests). > > The error always occurs AFTER my controller method returns (not when I make the calls). Also, the error does not always look the same. Sometimes it ends in a Bus Error, other times a Segmentation Fault, but always "incorrect checksum for freed object". > > This started when I migrated to 0.9.4 from 0.3.2 > > Thoughts? > > Thanks, > Shad > > ---- > > ruby(1895,0xa000ed98) malloc: *** error for object 0x266b1f0: incorrect checksum for freed object - object was probably modified after being freed, break at szone_error to debug > ruby(1895,0xa000ed98) malloc: *** set a breakpoint in szone_error to debug > ruby(1895,0xa000ed98) malloc: *** error for object 0x266ace0: incorrect checksum for freed object - object was probably modified after being freed, break at szone_error to debug > ruby(1895,0xa000ed98) malloc: *** set a breakpoint in szone_error to debug > /usr/lib/ruby/1.8/timeout.rb:40: [BUG] Segmentation fault > ruby 1.8.2 (2004-12-25) [powerpc-darwin8.0] > > Abort trap > > ----- > > ruby(1883,0xa000ed98) malloc: *** error for object 0x2198160: incorrect checksum for freed object - object was probably modified after being freed, break at szone_error to debug > ruby(1883,0xa000ed98) malloc: *** set a breakpoint in szone_error to debug > /usr/lib/ruby/gems/1.8/gems/rails-1.1.2/lib/webrick_server.rb:148: [BUG] Bus Error > ruby 1.8.2 (2004-12-25) [powerpc-darwin8.0] > > Abort trap > > ---- > > -- > http://www.ShadReynolds.com > http://www.flickr.com/photos/shadreynolds/ > -- http://www.ShadReynolds.com http://www.flickr.com/photos/shadreynolds/ From atomgiant at gmail.com Fri Aug 4 11:40:54 2006 From: atomgiant at gmail.com (Tom Davies) Date: Fri, 4 Aug 2006 11:40:54 -0400 Subject: [Ferret-talk] A couple of ferret 0.9.4 exceptions Message-ID: Hi Dave, I am using ferret at my site http://gifthat.com and I just had a few exceptions pop up. I don't have a way to reproduce them, but my site just was listed on lifehacker.com and these issues have popped up under multiple concurrent users (only twice though which I think isn't too bad). I am using two lighttpd instances both with read/write access to the index: 1) Error occured at :318 Error: exception 2 not handled: Couldn't open the file to read This happened while adding a document to the index like so: gift_index << self.to_doc 2) Error occured at :2642 Error: exception 6 not handled: Could not obtain commit lock when trying to write index This happened while doing a search_each like so: gift_index.search_each The weird thing is that I would think the search each doesn't need to write to the index. Both of these issues appear to have happened at the same time, so they may be related to each other. Thanks for your excellent work on Ferret! Please let me know if you need any more info from me. Tom Davies http://atomgiant.com http://gifthat.com From kraemer at webit.de Fri Aug 4 15:43:51 2006 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 4 Aug 2006 21:43:51 +0200 Subject: [Ferret-talk] Ruby/Gtk Luke port Message-ID: <20060804194351.GC16112@cordoba.webit.de> Hi all, some days ago I wrote that I once had started porting Luke to Ferret with Ruby/Gtk. I just dug out those sources and put them under version control. It's far from finished and my first Gtk program, but might be a good start anyway. the code is available at svn://projects.jkraemer.net/inspector/trunk/ If anybody wants to contribute, I'll be glad to grant commit rights. Please note that I'm on vacation (that is, offline most of the time) for the next few days and therefore won't be able to respond to emails that frequently. so long, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From samuelgiffney at gmail.com Fri Aug 4 17:52:12 2006 From: samuelgiffney at gmail.com (Sam) Date: Fri, 4 Aug 2006 23:52:12 +0200 Subject: [Ferret-talk] Mongrel Cluster Compatibility In-Reply-To: References: Message-ID: <5811e39c9403b9b6cadb51a577e451ac@ruby-forum.com> > Solution? You can kind of get around this problem by using the Index > class and setting the :auto_flush parameter to true. If you are > concerned about performance though you are better off with just the > one process writing to the index. > > Does that make sense? Yes. Mongrel_cluster uses a different pid for each mongrel instance. I don't think there would be any way to specify which instance the load balancer uses based on the action so... :auto_flush seems the way to go. The application should be 99.9% about reading so hopefully the performance hit won't be significant. I'll have to test and see. Cheers Dave. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Fri Aug 4 21:33:15 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 5 Aug 2006 10:33:15 +0900 Subject: [Ferret-talk] Mongrel Cluster Compatibility In-Reply-To: <5811e39c9403b9b6cadb51a577e451ac@ruby-forum.com> References: <5811e39c9403b9b6cadb51a577e451ac@ruby-forum.com> Message-ID: On 8/5/06, Sam wrote: > > Solution? You can kind of get around this problem by using the Index > > class and setting the :auto_flush parameter to true. If you are > > concerned about performance though you are better off with just the > > one process writing to the index. > > > > Does that make sense? > > Yes. Mongrel_cluster uses a different pid for each mongrel instance. I > don't think there would be any way to specify which instance the load > balancer uses based on the action so... :auto_flush seems the way to go. > The application should be 99.9% about reading so hopefully the > performance hit won't be significant. I'll have to test and see. Cheers > Dave. Cool. If that solution doesn't work you can write a simple server using DRb that takes indexing requests. This shouldn't be too hard and will probably be added to a future version of Ferret. Dave From dbalmain.ml at gmail.com Fri Aug 4 21:51:21 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 5 Aug 2006 10:51:21 +0900 Subject: [Ferret-talk] A couple of ferret 0.9.4 exceptions In-Reply-To: References: Message-ID: On 8/5/06, Tom Davies wrote: > Hi Dave, > > I am using ferret at my site http://gifthat.com and I just had a few > exceptions pop up. I don't have a way to reproduce them, but my site > just was listed on lifehacker.com and these issues have popped up > under multiple concurrent users (only twice though which I think isn't > too bad). I am using two lighttpd instances both with read/write > access to the index: > > 1) Error occured at :318 > Error: exception 2 not handled: Couldn't open the file to read > > This happened while adding a document to the index like so: > > gift_index << self.to_doc I'm afraid there isn't much more I can do about this bug without more information. If you can help me reproduce it here I can usually fix it very quickly. > 2) Error occured at :2642 > Error: exception 6 not handled: Could not obtain commit lock when > trying to write index > > This happened while doing a search_each like so: > > gift_index.search_each > > The weird thing is that I would think the search each doesn't need to > write to the index. If you are using the Index class and you call a search method on an Index object that has just written to the index then the Index object will need to commit any changes before opening the IndexSearcher. It is possible that the commit lock was still hanging around after the first crash? > Both of these issues appear to have happened at the same time, so they > may be related to each other. > > Thanks for your excellent work on Ferret! Please let me know if you > need any more info from me. Like I said earlier, if you can reproduce the error I may be able to help. Otherwise I'm fully committed to getting 0.10 out. When it is out, I'd recommend getting it into your development app as soon as you can. Although it may be a little less stable then the 0.9 series, it shouldn't be long before it is more stable. Cheers, Dave From dbalmain.ml at gmail.com Fri Aug 4 22:27:07 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 5 Aug 2006 11:27:07 +0900 Subject: [Ferret-talk] Ruby/Gtk Luke port In-Reply-To: <20060804194351.GC16112@cordoba.webit.de> References: <20060804194351.GC16112@cordoba.webit.de> Message-ID: On 8/5/06, Jens Kraemer wrote: > Hi all, > > some days ago I wrote that I once had started porting Luke to Ferret > with Ruby/Gtk. I just dug out those sources and put them under version > control. > > It's far from finished and my first Gtk program, but might be a good > start anyway. > > the code is available at > svn://projects.jkraemer.net/inspector/trunk/ > > If anybody wants to contribute, I'll be glad to grant commit rights. > Please note that I'm on vacation (that is, offline most of the time) > for the next few days and therefore won't be able to respond to emails > that frequently. > > so long, > Jens Great work Jens. Have a great vacation. For anyone who is wondering, here is how I got it working on Ubuntu; >$ sudo apt-get libgtk2-ruby >$ svn co svn://projects.jkraemer.net/inspector/trunk/ inspector >$ cd inspector > ruby -Ilib bin/inspector /path/to/index Btw, can anyone think of a better name for this than inspector? Cheers, Dave From atomgiant at gmail.com Fri Aug 4 22:48:30 2006 From: atomgiant at gmail.com (Tom Davies) Date: Fri, 4 Aug 2006 22:48:30 -0400 Subject: [Ferret-talk] A couple of ferret 0.9.4 exceptions In-Reply-To: References: Message-ID: Thanks Dave. If I can isolate this issue I will get back to you with more information. It has happened a few more times in those exact same two places, but not enough to be a big deal for now. Thanks, Tom On 8/4/06, David Balmain wrote: > On 8/5/06, Tom Davies wrote: > > Hi Dave, > > > > I am using ferret at my site http://gifthat.com and I just had a few > > exceptions pop up. I don't have a way to reproduce them, but my site > > just was listed on lifehacker.com and these issues have popped up > > under multiple concurrent users (only twice though which I think isn't > > too bad). I am using two lighttpd instances both with read/write > > access to the index: > > > > 1) Error occured at :318 > > Error: exception 2 not handled: Couldn't open the file to read > > > > This happened while adding a document to the index like so: > > > > gift_index << self.to_doc > > I'm afraid there isn't much more I can do about this bug without more > information. If you can help me reproduce it here I can usually fix it > very quickly. > > > 2) Error occured at :2642 > > Error: exception 6 not handled: Could not obtain commit lock when > > trying to write index > > > > This happened while doing a search_each like so: > > > > gift_index.search_each > > > > The weird thing is that I would think the search each doesn't need to > > write to the index. > > If you are using the Index class and you call a search method on an > Index object that has just written to the index then the Index object > will need to commit any changes before opening the IndexSearcher. It > is possible that the commit lock was still hanging around after the > first crash? > > > Both of these issues appear to have happened at the same time, so they > > may be related to each other. > > > > Thanks for your excellent work on Ferret! Please let me know if you > > need any more info from me. > > Like I said earlier, if you can reproduce the error I may be able to > help. Otherwise I'm fully committed to getting 0.10 out. When it is > out, I'd recommend getting it into your development app as soon as you > can. Although it may be a little less stable then the 0.9 series, it > shouldn't be long before it is more stable. > > Cheers, > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Tom Davies http://atomgiant.com http://gifthat.com From bk at benjaminkrause.com Sat Aug 5 03:54:41 2006 From: bk at benjaminkrause.com (Benjamin Krause) Date: Sat, 05 Aug 2006 09:54:41 +0200 Subject: [Ferret-talk] Ruby/Gtk Luke port In-Reply-To: <20060804194351.GC16112@cordoba.webit.de> References: <20060804194351.GC16112@cordoba.webit.de> Message-ID: <44D44EC1.3070400@benjaminkrause.com> Jens Kraemer schrieb: > some days ago I wrote that I once had started porting Luke to Ferret > with Ruby/Gtk. I just dug out those sources and put them under version > control. > hey .. can you give me a brief description, what this is all about? Ben From f at andreas-s.net Sat Aug 5 04:39:41 2006 From: f at andreas-s.net (Andreas Schwarz) Date: Sat, 5 Aug 2006 10:39:41 +0200 Subject: [Ferret-talk] Mongrel Cluster Compatibility In-Reply-To: References: <5811e39c9403b9b6cadb51a577e451ac@ruby-forum.com> Message-ID: <465e3b0568504d52812a7fdf224ba439@ruby-forum.com> David Balmain wrote: > On 8/5/06, Sam wrote: >> The application should be 99.9% about reading so hopefully the >> performance hit won't be significant. I'll have to test and see. Cheers >> Dave. > > Cool. If that solution doesn't work you can write a simple server > using DRb that takes indexing requests. Or use a background process that updates the index for changed records (updated_at > last update timestamp) periodically. -- Posted via http://www.ruby-forum.com/. From tjackiw at gmail.com Sat Aug 5 05:25:05 2006 From: tjackiw at gmail.com (Thiago Jackiw) Date: Sat, 5 Aug 2006 11:25:05 +0200 Subject: [Ferret-talk] Frustrating locale setting error Message-ID: <3fa9f451935e8ea1dc07f6efb8c8bbc2@ruby-forum.com> Hi all, This has been very frustrating for me trying to get this acts_as_ferret working well on a Fedora box. On my mac it works great, no problems with locale, but when I put the code live on my Fedora server, it complains about the locale setting (Error occured at :498 Error: exception 2 not handled: Error decoding input string. Check that you have the locale set correctly). I currently have the latest versions for both ferret and acts_as_ferret on the two boxes. I've tried all the suggestions I could find, setting the LANG on env.rb, tried both the examples for setting the utf on http://projects.jkraemer.net/acts_as_ferret/, which I couldn't even get to work due to "`const_missing': uninitialized constant TokenFilter (NameError)", and I even installed mysql 5.0.22 and setting the default charset on the table structure to utf thinking that would have something to do, but didn't. So here I'm, stuck, hoping you guys could help me out with this issue. Just FYI, the locale for "en_US" on the fedora box is en_US en_US.iso88591 en_US.iso885915 en_US.utf8 Thanks. -- Posted via http://www.ruby-forum.com/. From tjackiw at gmail.com Sat Aug 5 06:30:28 2006 From: tjackiw at gmail.com (Thiago Jackiw) Date: Sat, 5 Aug 2006 12:30:28 +0200 Subject: [Ferret-talk] Frustrating locale setting error In-Reply-To: <3fa9f451935e8ea1dc07f6efb8c8bbc2@ruby-forum.com> References: <3fa9f451935e8ea1dc07f6efb8c8bbc2@ruby-forum.com> Message-ID: Thiago Jackiw wrote: > Hi all, > > This has been very frustrating for me trying to get this acts_as_ferret > working well on a Fedora box. On my mac it works great, no problems with > locale, but when I put the code live on my Fedora server, it complains > about the locale setting (Error occured at :498 Error: > exception 2 not handled: Error decoding input string. Check that you > have the locale set correctly). I currently have the latest versions for > both ferret and acts_as_ferret on the two boxes. > > I've tried all the suggestions I could find, setting the LANG on env.rb, > tried both the examples for setting the utf on > http://projects.jkraemer.net/acts_as_ferret/, which I couldn't even get > to work due to "`const_missing': uninitialized constant TokenFilter > (NameError)", and I even installed mysql 5.0.22 and setting the default > charset on the table structure to utf thinking that would have something > to do, but didn't. > > So here I'm, stuck, hoping you guys could help me out with this issue. > > Just FYI, the locale for "en_US" on the fedora box is > > en_US > en_US.iso88591 > en_US.iso885915 > en_US.utf8 > > > Thanks. Well, just to let you guys know, instead of setting the LANG=utf on env.rb, I tried en_US.iso88591 and that seems to be working fine for now. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sat Aug 5 10:23:55 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 5 Aug 2006 23:23:55 +0900 Subject: [Ferret-talk] A couple of ferret 0.9.4 exceptions In-Reply-To: References: Message-ID: On 8/5/06, Tom Davies wrote: > Thanks Dave. If I can isolate this issue I will get back to you with > more information. It has happened a few more times in those exact > same two places, but not enough to be a big deal for now. Thanks Tom. Even once is a big enough deal to be concerned as far as I'm concerned. From atomgiant at gmail.com Sat Aug 5 15:21:54 2006 From: atomgiant at gmail.com (Tom Davies) Date: Sat, 5 Aug 2006 15:21:54 -0400 Subject: [Ferret-talk] A couple of ferret 0.9.4 exceptions In-Reply-To: References: Message-ID: Yeah, I agree, it would be nice if it didn't happen at all. One question I had for you: Is there a way to run the c version of ferret on Windows? Since I do all of my development on Windows, that would be the only real way I could try to research this one. Thanks, Tom On 8/5/06, David Balmain wrote: > On 8/5/06, Tom Davies wrote: > > Thanks Dave. If I can isolate this issue I will get back to you with > > more information. It has happened a few more times in those exact > > same two places, but not enough to be a big deal for now. > > Thanks Tom. Even once is a big enough deal to be concerned as far as > I'm concerned. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Tom Davies http://atomgiant.com http://gifthat.com From dbalmain.ml at gmail.com Sat Aug 5 21:47:38 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 6 Aug 2006 10:47:38 +0900 Subject: [Ferret-talk] A couple of ferret 0.9.4 exceptions In-Reply-To: References: Message-ID: On 8/6/06, Tom Davies wrote: > Yeah, I agree, it would be nice if it didn't happen at all. > > One question I had for you: Is there a way to run the c version of > ferret on Windows? Since I do all of my development on Windows, that > would be the only real way I could try to research this one. Unfortunately no, not with the 0.9 series. Not without a fair bit of work anyway. But the 0.10 series on the other hand will be coming with a win32 gem. From contact at ezabel.com Sun Aug 6 13:07:39 2006 From: contact at ezabel.com (Ian Zabel) Date: Sun, 6 Aug 2006 19:07:39 +0200 Subject: [Ferret-talk] Return only results that user is allowed to see? Message-ID: Is it possible with acts_as_ferret to somehow restrict the results that are returned? For instance, I don't want to return results that are logically deleted with acts_as_paranoid (deleted_at IS NOT NULL and deleted_at < now()). Also, if a user is not an Admin, they should not be able to return results that have a certain value in a certain column, like forum_id != 13 (if 13 is an admin only forum). What's the best way to accomplish this? Do I need to include the deleted_id and forum_id columns in my index, and then pass in the appropriate search terms in the controller? Right now, the model I'm search is Comment, and I have this defined: acts_as_ferret :fields => [ 'comment' ] Any tips? -- Posted via http://www.ruby-forum.com/. From samuelgiffney at gmail.com Sun Aug 6 17:43:37 2006 From: samuelgiffney at gmail.com (Sam Giffney) Date: Mon, 7 Aug 2006 09:43:37 +1200 Subject: [Ferret-talk] Ruby/Gtk Luke port Message-ID: > Btw, can anyone think of a better name for this than inspector? How about - Fluke ? From dbalmain.ml at gmail.com Sun Aug 6 21:01:16 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 7 Aug 2006 10:01:16 +0900 Subject: [Ferret-talk] Ruby/Gtk Luke port In-Reply-To: <44D44EC1.3070400@benjaminkrause.com> References: <20060804194351.GC16112@cordoba.webit.de> <44D44EC1.3070400@benjaminkrause.com> Message-ID: On 8/5/06, Benjamin Krause wrote: > Jens Kraemer schrieb: > > some days ago I wrote that I once had started porting Luke to Ferret > > with Ruby/Gtk. I just dug out those sources and put them under version > > control. > > > hey .. > > can you give me a brief description, what this is all about? > > Ben Hi Ben, Basically it's just an index inspector. It allows you to inspect the documents and term vectors in the index and to scan through term enums and term-doc enums. This can be very helpful during development for debugging purposes and to give you a better idea of how the index works. Cheers, Dave From dbalmain.ml at gmail.com Mon Aug 7 02:41:50 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 7 Aug 2006 15:41:50 +0900 Subject: [Ferret-talk] Trac cleaned of Spam Message-ID: Hi all, I've just cleaned up trac and added the trac spam-filter so there should be fewer problems with spam now. We'll have to wait and see how it goes. I'll also start checking more often so I can get on top of the problem more quickly. If you spot any spam, please let me know or contact me for an admin password so that you properly rollback spam. Cheers, Dave From paul at mudgrubs.com Tue Aug 8 07:32:00 2006 From: paul at mudgrubs.com (Paul Wright) Date: Tue, 8 Aug 2006 13:32:00 +0200 Subject: [Ferret-talk] acts_as_ferret to search partial phrases and fuzzy Message-ID: Hi All, I was wondering if anyone had experience of extending AAF plugin for Rails to implement a broader query ? The documentation and the demo provided on the http://projects.jkraemer.net/acts_as_ferret/ wiki seems to only match full text queries, or partial when using a * wildcard. Ideally, I am trying to acheive something similar to the following (pseudo code): def search @query = params[:query] || '' unless @query.blank? @results = Content.find_by_contents @query if @results.empty? @results = Content.partial_word_search @query if @results.empty? @results = Content.fuzzy_word_search @query end end end end I realise I could probably achieve something like this by changing http://wiki.rubyonrails.com/rails/pages/HowToIntegrateFerretWithRails to meet my needs, but I quite like the indexing elegance provided with AAF, and don't want to have to reinvent the wheel. If anyone has experience or pointers that would be great! Apologies for posting on the ferret list for a rails-ish problem, but searching the archives seemed to show most of the questions end up here! Thank, Paul -- Posted via http://www.ruby-forum.com/. From dhanya.gl at gmail.com Wed Aug 9 07:53:21 2006 From: dhanya.gl at gmail.com (Dhanya) Date: Wed, 9 Aug 2006 13:53:21 +0200 Subject: [Ferret-talk] problem when updation takes place Message-ID: <3bf49809cb37587fc708894f92d63d69@ruby-forum.com> Hi, After updating a record i search that record by name.then i get two records.one is that before updation.another is the updated one.i use the after_save method in the model class to create index for new records. how i can solve this problem. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Aug 10 04:16:54 2006 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 10 Aug 2006 10:16:54 +0200 Subject: [Ferret-talk] problem when updation takes place In-Reply-To: <3bf49809cb37587fc708894f92d63d69@ruby-forum.com> References: <3bf49809cb37587fc708894f92d63d69@ruby-forum.com> Message-ID: <20060810081653.GA23671@cordoba.webit.de> Hi! On Wed, Aug 09, 2006 at 01:53:21PM +0200, Dhanya wrote: > Hi, > After updating a record i search that record by name.then i get two > records.one is that before updation.another is the updated one.i use > the after_save method in the model class to create index for new > records. > how i can solve this problem. you have to remove the old version of the record from the ferret index. You could have a look at acts_as_ferret for an example of how to do this. Or you could just use acts_as_ferret ;-) Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Thu Aug 10 04:37:37 2006 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 10 Aug 2006 10:37:37 +0200 Subject: [Ferret-talk] acts_as_ferret to search partial phrases and fuzzy In-Reply-To: References: Message-ID: <20060810083737.GA24524@cordoba.webit.de> Hi Paul! By default, aaf uses the stock Ferret QueryParser to parse queries handed to find_by_contents. You have the full power of the Ferret Query Language, including fuzzy search and friends. However, if you give find_by_contents a Query object, this query will be taken as is. So the easiest way to use aaf with custom queries would be to build those queries outside aaf and use them with find_by_contents. Does this already solve your problem ? Jens On Tue, Aug 08, 2006 at 01:32:00PM +0200, Paul Wright wrote: > Hi All, > > I was wondering if anyone had experience of extending AAF plugin for > Rails to implement a broader query ? > > The documentation and the demo provided on the > http://projects.jkraemer.net/acts_as_ferret/ wiki seems to only match > full text queries, or partial when using a * wildcard. > > Ideally, I am trying to acheive something similar to the following > (pseudo code): > > def search > @query = params[:query] || '' > unless @query.blank? > @results = Content.find_by_contents @query > if @results.empty? > @results = Content.partial_word_search @query > if @results.empty? > @results = Content.fuzzy_word_search @query > end > end > end > end > > I realise I could probably achieve something like this by changing > http://wiki.rubyonrails.com/rails/pages/HowToIntegrateFerretWithRails to > meet my needs, but I quite like the indexing elegance provided with AAF, > and don't want to have to reinvent the wheel. > > If anyone has experience or pointers that would be great! > > Apologies for posting on the ferret list for a rails-ish problem, but > searching the archives seemed to show most of the questions end up here! > > Thank, > Paul > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Thu Aug 10 05:03:50 2006 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 10 Aug 2006 11:03:50 +0200 Subject: [Ferret-talk] Frustrating locale setting error In-Reply-To: References: <3fa9f451935e8ea1dc07f6efb8c8bbc2@ruby-forum.com> Message-ID: <20060810090350.GD24524@cordoba.webit.de> On Sat, Aug 05, 2006 at 12:30:28PM +0200, Thiago Jackiw wrote: > > Just FYI, the locale for "en_US" on the fedora box is > > > > en_US > > en_US.iso88591 > > en_US.iso885915 > > en_US.utf8 > > > > > > Thanks. > > Well, just to let you guys know, instead of setting the LANG=utf on > env.rb, I tried en_US.iso88591 and that seems to be working fine for > now. LANG=en_US.utf8 should work, too. At least in theory... Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Thu Aug 10 05:17:12 2006 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 10 Aug 2006 11:17:12 +0200 Subject: [Ferret-talk] Return only results that user is allowed to see? In-Reply-To: References: Message-ID: <20060810091712.GF24524@cordoba.webit.de> On Sun, Aug 06, 2006 at 07:07:39PM +0200, Ian Zabel wrote: > Is it possible with acts_as_ferret to somehow restrict the results that > are returned? > > For instance, I don't want to return results that are logically deleted > with acts_as_paranoid (deleted_at IS NOT NULL and deleted_at < now()). > Also, if a user is not an Admin, they should not be able to return > results that have a certain value in a certain column, like forum_id != > 13 (if 13 is an admin only forum). > > What's the best way to accomplish this? Do I need to include the > deleted_id and forum_id columns in my index, and then pass in the > appropriate search terms in the controller? If you don't want to filter results _after_ getting them from acts_as_ferret, including the information needed to filter search results in the index is necessary. You could then use custom built queries to constrain searches beyond the query terms given by the user. Another way to achieve this are query filters, which can enhance speed in case of repeated searches with the same filter criteria. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From alexander.dean at gmail.com Thu Aug 10 15:55:11 2006 From: alexander.dean at gmail.com (Alex Dean) Date: Thu, 10 Aug 2006 21:55:11 +0200 Subject: [Ferret-talk] Indexing weirdness Message-ID: Hi, I'm having some problems with acts_as_ferret and indexing. I've got a simple 5 column table, and a corresponding Ruby model. I loaded 9 rows directly into the table using mysql (for live it'll be about .5mil rows) and configured a Rails search page along the exact same lines as ferret_demo. Unfortunately, search wouldn't return any results; following the advice given to other posters, I deleted the index/development/[ModelName] folder. I expected then for the index to automatically get rebuilt, but unfortunately any search then resulted in: : Error occured at :318 Error: exception 2 not handled: Couldn't open the file to read vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:284:in `search' vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:284:in `find_id_by_contents' vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:246:in `find_by_contents' #{RAILS_ROOT}/app/controllers/search_controller.rb:55:in `search' I noticed that the index/development/[ModelName] folder was being recreated, but empty. Luckily I managed to get the rebuild_index.rb script working, which recreated the index folder structure, and then added a couple of files: _a.cfs and segments I can now search without error, but I don't get any search results for the original 9 rows in the table. I can add rows, and these extra rows seem to cause activity in index/development/[ModelName] folder, and they get picked up by search. But the fundamental problem remains, that I can't get Ferret to search and return the original 9 rows. What am I doing wrong? Thanks, Alex -- Posted via http://www.ruby-forum.com/. From alexander.dean at gmail.com Thu Aug 10 18:52:00 2006 From: alexander.dean at gmail.com (Alex Dean) Date: Fri, 11 Aug 2006 00:52:00 +0200 Subject: [Ferret-talk] Timeout when rebuilding index Message-ID: <817f9c03394c269a2dc956ea22a52827@ruby-forum.com> Sorry to post twice in a row, but I also wanted help with my index rebuild process - I seem to be getting timeouts when I run the rebuild_index.rb script, like this: /opt/local/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/connection_adapters/abstract_adapter.rb:120:in `log': Mysql::Error: Lost connection to MySQL server during query: SELECT * FROM search_queries (ActiveRecord::StatementInvalid) from /opt/local/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/connection_adapters/mysql_adapter.rb:185:in `execute' from /opt/local/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/connection_adapters/mysql_adapter.rb:337:in `select' deprecated_finders.rb:37:in `find_all' from rebuild_index.rb:12 To give some context: this is the rebuild_index script which lives in vendor/plugins/acts_as_ferret, and it's working on a single table with 6 columns in it, and approx. 200 meg of data. This table is mapped onto a single Model, which uses acts_as_ferret in (what I think) is a very straightforward fashion - just the fields and boosts. Happy to give table definition if you think that's important... What am I doing wrong - is Ferret not meant for this sort of application? Thanks, Alex -- Posted via http://www.ruby-forum.com/. From contact at ezabel.com Fri Aug 11 00:14:46 2006 From: contact at ezabel.com (Ian Zabel) Date: Fri, 11 Aug 2006 06:14:46 +0200 Subject: [Ferret-talk] Return only results that user is allowed to see? In-Reply-To: References: Message-ID: <75a2eb03ac6d1abb95fec5d98ab27b2f@ruby-forum.com> Just wanted to clarify a bit. I guess what I'm basically asking is how can I use this kind of condition: :conditions => 'forum_id != 13' On this model: class Comment < ActiveRecord::Base acts_as_paranoid acts_as_ferret :fields => [ 'comment' ] end Do I need to add forum_id to the list of fields? And how do I write the query? This doesn't seem to work... Comment.find_by_contents("test", :conditions => 'forum_id != 13') Thanks, Ian. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Fri Aug 11 01:05:06 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 11 Aug 2006 14:05:06 +0900 Subject: [Ferret-talk] Return only results that user is allowed to see? In-Reply-To: <75a2eb03ac6d1abb95fec5d98ab27b2f@ruby-forum.com> References: <75a2eb03ac6d1abb95fec5d98ab27b2f@ruby-forum.com> Message-ID: On 8/11/06, Ian Zabel wrote: > Just wanted to clarify a bit. I guess what I'm basically asking is how > can I use this kind of condition: > > :conditions => 'forum_id != 13' > > On this model: > > class Comment < ActiveRecord::Base > acts_as_paranoid > acts_as_ferret :fields => [ 'comment' ] > end > > > Do I need to add forum_id to the list of fields? And how do I write the > query? > > This doesn't seem to work... > Comment.find_by_contents("test", :conditions => 'forum_id != 13') > > Thanks, > Ian. Hi Ian, Try adding 'forum_id' to your list of fields and write the query like this; Comment.find_by_contents("test AND NOT forum_id:13") or; Comment.find_by_contents("test -forum_id:13") Hope that helps, Dave From davidsmit at gmail.com Fri Aug 11 23:19:23 2006 From: davidsmit at gmail.com (David Smit) Date: Sat, 12 Aug 2006 05:19:23 +0200 Subject: [Ferret-talk] Ferret Wierdness Message-ID: Hi, I have installed the ferret gem (on WIN XP). My search and result view is working, but it comes back with empty results all the time. It seems like ferret is building the index (there is 3 binary files in the index directory). If I add the line: acts_as_ferret :fields => ['title','description','price','website_url'] to my advert model I get the following error: undefined method `acts_as_ferret' for Advert:Class Please help, David -- Posted via http://www.ruby-forum.com/. From jan.prill at gmail.com Sat Aug 12 01:19:41 2006 From: jan.prill at gmail.com (Jan Prill) Date: Sat, 12 Aug 2006 07:19:41 +0200 Subject: [Ferret-talk] Ferret Wierdness In-Reply-To: References: Message-ID: <562a35c10608112219t4b5ffad5s9a91922603b1af75@mail.gmail.com> Hi David, have you installed the acts_as_ferret plugin as well as the ferret gem? acts_as_ferret is a convenience library that includes ferret in your rails application. You'll find installation instructions on the acts_as_ferret trac: http://projects.jkraemer.net/acts_as_ferret/ Cheers, Jan On 8/12/06, David Smit wrote: > > Hi, > > I have installed the ferret gem (on WIN XP). My search and result view > is working, but it comes back with empty results all the time. It seems > like ferret is building the index (there is 3 binary files in the index > directory). > > If I add the line: > acts_as_ferret :fields => ['title','description','price','website_url'] > > to my advert model I get the following error: > undefined method `acts_as_ferret' for Advert:Class > > Please help, > > David > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060812/e986bdd0/attachment.html From kraemer at webit.de Sat Aug 12 04:57:21 2006 From: kraemer at webit.de (Jens Kraemer) Date: Sat, 12 Aug 2006 10:57:21 +0200 Subject: [Ferret-talk] Timeout when rebuilding index In-Reply-To: <817f9c03394c269a2dc956ea22a52827@ruby-forum.com> References: <817f9c03394c269a2dc956ea22a52827@ruby-forum.com> Message-ID: <20060812085721.GA4149@cordoba.webit.de> On Fri, Aug 11, 2006 at 12:52:00AM +0200, Alex Dean wrote: > Sorry to post twice in a row, but I also wanted help with my index > rebuild process - I seem to be getting timeouts when I run the > rebuild_index.rb script, like this: > > /opt/local/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/connection_adapters/abstract_adapter.rb:120:in > `log': Mysql::Error: Lost connection to MySQL server during query: > SELECT * FROM search_queries (ActiveRecord::StatementInvalid) > from > /opt/local/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/connection_adapters/mysql_adapter.rb:185:in > `execute' > from > /opt/local/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/connection_adapters/mysql_adapter.rb:337:in > `select' > > > > deprecated_finders.rb:37:in `find_all' > from rebuild_index.rb:12 > > To give some context: this is the rebuild_index script which lives in > vendor/plugins/acts_as_ferret, and it's working on a single table with 6 > columns in it, and approx. 200 meg of data. > > This table is mapped onto a single Model, which uses acts_as_ferret in > (what I think) is a very straightforward fashion - just the fields and > boosts. You shouldn't use that rebuild_index script any more - it has been removed from recent versions of the plugin and was replaced by a class method named rebuild_index. Just call this (via YourModel.rebuild_index) to rebuild your index. Maybe the problem with your query is simply the amount of data fetched by find_all (how many rows does your table have?). Version 0.2.2 of the plugin does a chunked index rebuild (fetching 1000 rows at a time, then indexing these, and so on). See http://projects.jkraemer.net/acts_as_ferret/wiki for more information on versions of the plugin, their compatibility with Ferret versions and how to install them. [..] > What am I doing wrong - is Ferret not meant for this sort of > application? Don't see what you could be doing wrong - maybe your mysql server logs state something meaningful ? Regards, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Sat Aug 12 05:05:23 2006 From: kraemer at webit.de (Jens Kraemer) Date: Sat, 12 Aug 2006 11:05:23 +0200 Subject: [Ferret-talk] Indexing weirdness In-Reply-To: References: Message-ID: <20060812090523.GB4149@cordoba.webit.de> Hi Alex! On Thu, Aug 10, 2006 at 09:55:11PM +0200, Alex Dean wrote: > Hi, > > I'm having some problems with acts_as_ferret and indexing. I've got a > simple 5 column table, and a corresponding Ruby model. I loaded 9 rows > directly into the table using mysql (for live it'll be about .5mil rows) > and configured a Rails search page along the exact same lines as > ferret_demo. > > Unfortunately, search wouldn't return any results; following the advice > given to other posters, I deleted the index/development/[ModelName] > folder. I expected then for the index to automatically get rebuilt, but > unfortunately any search then resulted in: the index should be rebuilt if you stop your application, delete the folder as you did, and then start the app again. If that doesn't work, please try upgrading to the latest version (0.2.2) of the plugin. > : Error occured at :318 > Error: exception 2 not handled: Couldn't open the file to read > > vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:284:in `search' > vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:284:in > `find_id_by_contents' > vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:246:in > `find_by_contents' > #{RAILS_ROOT}/app/controllers/search_controller.rb:55:in `search' > > I noticed that the index/development/[ModelName] folder was being > recreated, but empty. > > Luckily I managed to get the rebuild_index.rb script working, which > recreated the index folder structure, and then added a couple of files: > _a.cfs and segments > > I can now search without error, but I don't get any search results for > the original 9 rows in the table. I can add rows, and these extra rows > seem to cause activity in index/development/[ModelName] folder, and they > get picked up by search. > > But the fundamental problem remains, that I can't get Ferret to search > and return the original 9 rows. hm, are you sure these 9 rows look the same as those created by your later inserts ? Does loading/saving them in Rails cause them to get indexed ? Have a look at your development log too see what happens when doing so, acts_as_ferret logs quite exactly what it is doing. hth, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From bill at ilovett.com Sat Aug 12 10:30:40 2006 From: bill at ilovett.com (Bill Lovett) Date: Sat, 12 Aug 2006 16:30:40 +0200 Subject: [Ferret-talk] How do empty strings affect sorting? Message-ID: I'm creating a search that allows results to be sorted in different ways. In defining the sortable fields, I was careful to use untokenized indexes. Everything was working great except for one field-- it refused to sort properly, even though all the others were fine. It seems as if the presence of empty strings in my data were to blame. By setting them to a default value, sorting on that field suddenly worked fine. Why is that? The same failure happened when I changed the empty strings to nulls. Do I always have to check for empty strings or nulls when defining sort fields? -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sat Aug 12 11:53:39 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 13 Aug 2006 00:53:39 +0900 Subject: [Ferret-talk] How do empty strings affect sorting? In-Reply-To: References: Message-ID: On 8/12/06, Bill Lovett wrote: > I'm creating a search that allows results to be sorted in different > ways. In defining the sortable fields, I was careful to use untokenized > indexes. Everything was working great except for one field-- it refused > to sort properly, even though all the others were fine. > > It seems as if the presence of empty strings in my data were to blame. > By setting them to a default value, sorting on that field suddenly > worked fine. Why is that? The same failure happened when I changed the > empty strings to nulls. > > Do I always have to check for empty strings or nulls when defining sort > fields? Hi Bill, This is a bug which has sort of been fixed in the latest version. I say sort of because the solution is not really ideal. For integer or float fields the default value is set to 0. Ideally, I think undefined values should come after defined values no matter what the order but this is a little harder to do with the current implementation. It works for string fields but not for integer and float fields. Cheers, Dave From chris.lowis at gmail.com Sat Aug 12 13:30:35 2006 From: chris.lowis at gmail.com (Chris Lowis) Date: Sat, 12 Aug 2006 19:30:35 +0200 Subject: [Ferret-talk] How to add # anchor to ferret generated url ? Message-ID: <835de7d463d24246c60017a9e6e06932@ruby-forum.com> I have the following action in my controller def search @query = params[:query] || '' unless @query.blank? @results = Residence.find_by_contents @query end end In my View I have <% if @results -%> ... <% end %> When the user clicks "search" the generated url is eg: http://localhost:3000/?query=10013&commit=search How would I append a #results to this url so that the browser window "jumps" to the result output part of the page ? I apologise if this is not a ferret-specific question. Thank you for your help, Chris -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Sun Aug 13 07:10:15 2006 From: kraemer at webit.de (Jens Kraemer) Date: Sun, 13 Aug 2006 13:10:15 +0200 Subject: [Ferret-talk] How to add # anchor to ferret generated url ? In-Reply-To: <835de7d463d24246c60017a9e6e06932@ruby-forum.com> References: <835de7d463d24246c60017a9e6e06932@ruby-forum.com> Message-ID: <20060813111014.GA11381@cordoba.webit.de> Hi! On Sat, Aug 12, 2006 at 07:30:35PM +0200, Chris Lowis wrote: [..] > In my View I have > >