From john at johnleach.co.uk Sat Apr 3 06:44:30 2010 From: john at johnleach.co.uk (John Leach) Date: Sat, 03 Apr 2010 11:44:30 +0100 Subject: [Xapian-fu-discuss] SOS for xapian-fu In-Reply-To: <323AB3E0-4634-4529-A4F4-95ABF606B85C@gearboxsoft.com> References: <323AB3E0-4634-4529-A4F4-95ABF606B85C@gearboxsoft.com> Message-ID: <4BB71C0E.60802@johnleach.co.uk> Hi William, it looks like you're just missing (or half missing) the xapian library itself (which is a C++ library with Ruby wrappers, that Xapian Fu uses). I don't know anything about OSX, but the Xapian website says: "The Fink project has packages for xapian-core, Omega, and the Python and Ruby bindings. Alternatively, MacPorts has packages for xapian-core." http://xapian.org/download Hope that helps, John. On 01/04/10 11:00, Yeung William wrote: > Hi there, > > Sorry to bug you directly but I am screwed here and tried quite a bit google work without luck. > > I tried to use xapian-fu on my snow leopard but it seems no longer working! I tried to install xapian gem but if fails: > > SASABI:blog goodwill$ sudo gem install xapian > Password: > Building native extensions. This could take a while... > ERROR: Error installing xapian: > ERROR: Failed to build gem native extension. > > /opt/ruby-enterprise/bin/ruby extconf.rb > ./configure --with-ruby > checking for a BSD-compatible install... /usr/bin/install -c > checking whether build environment is sane... yes > checking for a thread-safe mkdir -p... ./install-sh -c -d > checking for gawk... no > checking for mawk... no > checking for nawk... no > checking for awk... awk > checking whether make sets $(MAKE)... yes > checking how to create a ustar tar archive... gnutar > checking build system type... i386-apple-darwin10.3.0 > checking host system type... i386-apple-darwin10.3.0 > checking for style of include used by make... GNU > checking for gcc... gcc > checking for C compiler default output file name... a.out > checking whether the C compiler works... yes > checking whether we are cross compiling... no > checking for suffix of executables... > checking for suffix of object files... o > checking whether we are using the GNU C compiler... yes > checking whether gcc accepts -g... yes > checking for gcc option to accept ISO C89... none needed > checking dependency style of gcc... gcc3 > checking for a sed that does not truncate output... /usr/bin/sed > checking for grep that handles long lines and -e... /usr/bin/grep > checking for egrep... /usr/bin/grep -E > checking for ld used by gcc... /usr/libexec/gcc/i686-apple-darwin10/4.2.1/ld > checking if the linker (/usr/libexec/gcc/i686-apple-darwin10/4.2.1/ld) is GNU ld... no > checking for /usr/libexec/gcc/i686-apple-darwin10/4.2.1/ld option to reload object files... -r > checking for BSD-compatible nm... /usr/bin/nm > checking whether ln -s works... yes > checking how to recognize dependent libraries... pass_all > checking how to run the C preprocessor... gcc -E > checking for ANSI C header files... yes > checking for sys/types.h... yes > checking for sys/stat.h... yes > checking for stdlib.h... yes > checking for string.h... yes > checking for memory.h... yes > checking for strings.h... yes > checking for inttypes.h... yes > checking for stdint.h... yes > checking for unistd.h... yes > checking dlfcn.h usability... yes > checking dlfcn.h presence... yes > checking for dlfcn.h... yes > checking for g++... g++ > checking whether we are using the GNU C++ compiler... yes > checking whether g++ accepts -g... yes > checking dependency style of g++... gcc3 > checking how to run the C++ preprocessor... g++ -E > checking the maximum length of command line arguments... 196608 > checking command to parse /usr/bin/nm output from gcc object... ok > checking for objdir... .libs > checking for ar... ar > checking for ranlib... ranlib > checking for strip... strip > checking for dsymutil... dsymutil > checking for nmedit... nmedit > checking for -single_module linker flag... yes > checking for -exported_symbols_list linker flag... yes > checking if gcc supports -fno-rtti -fno-exceptions... no > checking for gcc option to produce PIC... -fno-common > checking if gcc PIC flag -fno-common works... yes > checking if gcc static flag -static works... no > checking if gcc supports -c -o file.o... yes > checking whether the gcc linker (/usr/libexec/gcc/i686-apple-darwin10/4.2.1/ld) supports shared libraries... yes > checking dynamic linker characteristics... darwin10.3.0 dyld > checking how to hardcode library paths into programs... immediate > checking whether stripping libraries is possible... yes > checking if libtool supports shared libraries... yes > checking whether to build shared libraries... yes > checking whether to build static libraries... no > checking whether we are using the GNU C++ compiler... (cached) yes > checking whether g++ accepts -g... (cached) yes > checking dependency style of g++... (cached) gcc3 > checking for xapian-config... no > configure: error: Can't find xapian-config, although the xapian-core runtime library seems to be installed. If you've installed xapian-core from a package, you probably need to install an extra package called something like xapian-core-devel in order to be able to build code using the Xapian library. > extconf.rb:3:in `system!': unhandled exception > from extconf.rb:5 > > > Gem files will remain installed in /opt/ruby-enterprise/lib/ruby/gems/1.8/gems/xapian-1.0.15 for inspection. > Results logged to /opt/ruby-enterprise/lib/ruby/gems/1.8/gems/xapian-1.0.15/gem_make.out > > I tried to install xapian-full that works, but still that seems not helping xapian-fu to work with xapian either. Any idea how to fix this? > > Regards, > > William Yeung From john at johnleach.co.uk Wed Apr 21 18:11:29 2010 From: john at johnleach.co.uk (John Leach) Date: Wed, 21 Apr 2010 23:11:29 +0100 Subject: [Xapian-fu-discuss] Xapian-fu: Any chance I can fully disable stopword list? In-Reply-To: <4EF94B23-A1B9-4A1C-AD3E-8757A9FAE0AD@gearboxsoft.com> References: <4EF94B23-A1B9-4A1C-AD3E-8757A9FAE0AD@gearboxsoft.com> Message-ID: <1271887889.4685.84.camel@dogen> Hi William, this is a bug! The documentation was lying, the code didn't support setting false as a stopper. I've fixed the code now and made a new release of XapianFu, version 1.1.1. You can grab it from github: http://github.com/johnl/xapian-fu/downloads or wait until I'm done fighting with gem cutter and it'll be in the gem repository. Thanks for the bug report! John. On Tue, 2010-04-20 at 16:26 +0800, Yeung William wrote: > As title, coz I am using xapian for chinese too, which I don't have a > good stopword list file for that. I thought documentation said set it > to false would work, but it busted with UnsupportedStopperLanguage > (lang = false). I saw the stemmer file can be disabled, how about > stopper? > > Regards, > > > William Yeung From john at johnleach.co.uk Thu Apr 22 03:26:22 2010 From: john at johnleach.co.uk (John Leach) Date: Thu, 22 Apr 2010 08:26:22 +0100 Subject: [Xapian-fu-discuss] Xapian-fu: Chinese text indexing In-Reply-To: <8B8AA256-B69A-41BC-92CB-DF7EB769000B@gearboxsoft.com> References: <4EF94B23-A1B9-4A1C-AD3E-8757A9FAE0AD@gearboxsoft.com> <1271887889.4685.84.camel@dogen> <8B8AA256-B69A-41BC-92CB-DF7EB769000B@gearboxsoft.com> Message-ID: <1271921182.12707.48.camel@dogen> On Thu, 2010-04-22 at 12:13 +0800, Yeung William wrote: > By the way, I am trying to do some chinese text indexing, but as > chinese doesn't have separator character like english, that doesn't > seems to work well, any suggestion? Hi William, I've not had experience with indexing Chinese text myself but I can't think of any problems with XapianFu. You'd need to just disable the stemmer (as I understand, Chinese has few stems). The XapianFu::SimpleStopper would work fine if you provided your own stop list, as it is given an entry at a time to test - it doesn't need to worry about separator characters. You'd then need to concentrate on Xapian itself. It sounds like you'd just need to write a TermGenerator and a QueryParser. The TermGenerator currently uses whitespace to tokenize text into terms (which are then passed through the stopper and then indexed). You'd need a TermGenerator that understood how to tokenize Chinese. And you'd need to write a QueryParser too, which also currently considers whitespace. More info here, with some links to Chinese tokenizing libraries too: http://grokbase.com/topic/2008/02/26/xapian-discuss-chinese-japanese-index-support/dMTg2EYLUZM1eA6clai9WFCjIlM John. >