From kelly.terry.jones at gmail.com Wed Jul 1 00:19:34 2009 From: kelly.terry.jones at gmail.com (Kelly Jones) Date: Tue, 30 Jun 2009 21:19:34 -0700 Subject: [Ferret-talk] Ferret indexing openstreetmap.org data Message-ID: <26face530906302119m1cd935c7wda733fb383bf1943@mail.gmail.com> I want to use Ferret to index openstreetmap.org's (OSM) node data: 400M pairs of latitudes and longitudes. Has anyone already done this? What optimizations can I use? I don't need to access the data while it's indexing, if that helps. Can grid computing help here? Can I index chunks of data separately and then efficiently merge the indexes? I realize OSM has lots more data (ways, tags, relations, etc), but indexing the nodes (to find all nodes within a given latitude/longitude range) would be my first step. I also realize MySQL or PostgreSQL could do this, but I'm looking for an embedded/serverless solution and sqlite3 indexing is too slow. -- We're just a Bunch Of Regular Guys, a collective group that's trying to understand and assimilate technology. We feel that resistance to new ideas and technology is unwise and ultimately futile. From kelly.terry.jones at gmail.com Mon Jul 6 10:15:37 2009 From: kelly.terry.jones at gmail.com (Kelly Jones) Date: Mon, 6 Jul 2009 07:15:37 -0700 Subject: [Ferret-talk] Using ferret as a base64-encoded numerical db Message-ID: <26face530907060715u6ecf25e4y582ac96545a8f015@mail.gmail.com> I'm using ferret to store random base64 strings of length 72 (courtesy "dd if=/dev/random ... | mmencode"), with the long-term goal of storing floating point/integral numbers (converted to base64). Problems: % Ferret regards the base64 characters "+" and "/" as word separators, so a search for "content:[xji xjj]" yields things like "FqWu9uXM99HXZEJMl0Ux/jdOSP0+XJiL9v1ZDK24D0LMp60PUMPdhkbnFQykVMfilxecQFU6" where "xji" appears after a plus sign. How to avoid this? I could change "+" to "_", but I'm not sure changing "/" to "." or ":" or "-" or "!" would work. % Ferret's default search is case-insensitive, so I get things like "xJiQf0PEagWJME9Tf5pFu6dk4UGGFw5Lc0PIfa9N70Mb2IG2IWO36VCsC0y7Q1zOrLjk2Lz4", which match "xJi" but not "xji". How to fix? % When I do a range query, does ferret return *all* documents matching the query or only the highest scoring 10? For my purposes, I need *all* documents matching a query, not just the first few. Is anyone else using ferret as a db? Since it's hash-based, it's much faster at indexing large numbers of strings than sqlite3. I realize I could just 0-pad my numbers (eg, "000005" for 5), but I've got a LOT of data (400M pairs of floating point numbers), so I prefer compactness. -- We're just a Bunch Of Regular Guys, a collective group that's trying to understand and assimilate technology. We feel that resistance to new ideas and technology is unwise and ultimately futile. From kelly.terry.jones at gmail.com Mon Jul 6 10:17:14 2009 From: kelly.terry.jones at gmail.com (Kelly Jones) Date: Mon, 6 Jul 2009 07:17:14 -0700 Subject: [Ferret-talk] ferret-browser on the command-line? Message-ID: <26face530907060717l7c316f73q46fe7cce4725b5dd@mail.gmail.com> ferret-browser is really neat and tells me a lot about my ferret collection, but I'm not sure why it runs as a webserver? Is there a command-line program that does the same thing ferret-browser does? -- We're just a Bunch Of Regular Guys, a collective group that's trying to understand and assimilate technology. We feel that resistance to new ideas and technology is unwise and ultimately futile. From philip at prettybycritty.com Mon Jul 6 11:01:53 2009 From: philip at prettybycritty.com (Philip Ingram) Date: Mon, 6 Jul 2009 11:01:53 -0400 Subject: [Ferret-talk] ferret-browser on the command-line? In-Reply-To: <26face530907060717l7c316f73q46fe7cce4725b5dd@mail.gmail.com> References: <26face530907060717l7c316f73q46fe7cce4725b5dd@mail.gmail.com> Message-ID: <257DE5D1-D4EE-467F-9669-DD951077269F@prettybycritty.com> kelly, i'd be interested to find out how you got ferret-browser to work. thanks. On 6-Jul-09, at 10:17 AM, Kelly Jones wrote: > ferret-browser is really neat and tells me a lot about my ferret > collection, but I'm not sure why it runs as a webserver? > > Is there a command-line program that does the same thing ferret- > browser does? > > -- > We're just a Bunch Of Regular Guys, a collective group that's trying > to understand and assimilate technology. We feel that resistance to > new ideas and technology is unwise and ultimately futile. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From u.alberton at gmail.com Mon Jul 13 16:32:37 2009 From: u.alberton at gmail.com (Bira) Date: Mon, 13 Jul 2009 17:32:37 -0300 Subject: [Ferret-talk] Questions about query and index loading performance. Message-ID: Hello, I have a few questions about Ferret's performance when loading and querying indexes. How do load time and query time scale with the number of indexes being loaded or searched at once? Does the size of these indexes matter? >From what I have seen here, the time it takes to load the indexes seems to scale more or less linerarly (i.e., O(n)) with the number of indexes to load, but not necessarily with their size. Is that correct? What about query times? On a semi-related note, is there a way to figure out how many documents are there in an index without running a query on it? -- Bira http://compexplicita.wordpress.com http://compexplicita.tumblr.com