From simon.a.chiang at gmail.com Wed Apr 1 11:56:10 2009 From: simon.a.chiang at gmail.com (Simon Chiang) Date: Wed, 1 Apr 2009 09:56:10 -0600 Subject: [Nokogiri-talk] Help for a namespace newbie Message-ID: <85fafb0c0904010856y4abb1c3fyedcb511cc6cc7c5e@mail.gmail.com> Hi, just started messing around with Nokogiri and now I'm confused. I know example b returns nil because of the namespace but I don't really get why or how you work around this. a = Nokogiri::XML %q{content} puts a.root.at("/root") # => root node b = Nokogiri::XML %q{content} puts b.root.at("/root") # => nil Obviously 'blah' may be invalid simply because it's made up, but the same thing happens with a valid http namespace (valid in that I can go to the url and get something that looks like an XML namespace... uh... declaration?). Any help is appreciated, thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bobf at jhu.edu Wed Apr 1 15:51:44 2009 From: bobf at jhu.edu (Bob Fitterman) Date: Wed, 1 Apr 2009 15:51:44 -0400 Subject: [Nokogiri-talk] Advice on New Install to Local Directories Message-ID: <88b3725b0904011251ia5a62f3lc86b39ccab938055@mail.gmail.com> I'm hoping this is the right place for getting some help. I recently switched from Hpricot to Nokogiri and love the huge performance improvement. I do my development on Leopard, and it was a piece of cake to install. Unfortunately, I'm hosted on a shared environment, so it gets a little messier. I installed libxml2-2.7.3 and libxslt-1.1.24 into my own environment. (It's sitting in local/lib & local/include off my home directory.) When I go to do the gem install, it complains: /usr/bin/ruby1.8 extconf.rb install nokogiri -V checking for iconv.h in /usr/include,/opt/local/include,/usr/local/include,/usr/include... yes checking for libxml/parser.h in /opt/local/include/,/opt/local/include/libxml2,/usr/include/libxml2,/usr/include,/usr/local/include/libxml2,/usr/include/libxml2... yes checking for libxslt/xslt.h in /opt/local/include/,/opt/local/include/libxml2,/usr/include/libxml2,/usr/include,/usr/local/include/libxml2,/usr/include/libxml2... no libxslt is missing. try 'port install libxslt' or 'yum install libxslt-devel' *** extconf.rb failed *** Could not create Makefile due to some reason, probably lack of necessary libraries and/or headers. Check the mkmf.log file for more details. You may need configuration options. I downloaded the tarball and tried running "ruby extconf.rb --with-xslt-dir=/home/myname/local" and an assortment of combinations like that, but I can't seem to make it work. What's the secret? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.patterson at gmail.com Wed Apr 1 16:52:16 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Wed, 1 Apr 2009 13:52:16 -0700 Subject: [Nokogiri-talk] Advice on New Install to Local Directories In-Reply-To: <88b3725b0904011251ia5a62f3lc86b39ccab938055@mail.gmail.com> References: <88b3725b0904011251ia5a62f3lc86b39ccab938055@mail.gmail.com> Message-ID: <6959e1680904011352wa747b3cv12f10243bebfb3d9@mail.gmail.com> Hi Bob, 2009/4/1 Bob Fitterman : > I'm hoping this is the right place for getting some help. I recently > switched from Hpricot to Nokogiri and love the huge performance > improvement. > I do my development on Leopard, and it was a piece of cake to install. > Unfortunately, I'm hosted on a shared environment, so it gets a little > messier. I installed libxml2-2.7.3 and libxslt-1.1.24 into my own > environment. (It's sitting in local/lib & local/include off my home > directory.) When I go to do the gem install, it complains: > > /usr/bin/ruby1.8 extconf.rb install nokogiri -V > checking for iconv.h in > /usr/include,/opt/local/include,/usr/local/include,/usr/include... yes > checking for libxml/parser.h in > /opt/local/include/,/opt/local/include/libxml2,/usr/include/libxml2,/usr/include,/usr/local/include/libxml2,/usr/include/libxml2... > yes > checking for libxslt/xslt.h in > /opt/local/include/,/opt/local/include/libxml2,/usr/include/libxml2,/usr/include,/usr/local/include/libxml2,/usr/include/libxml2... > no > libxslt is missing. ?try 'port install libxslt' or 'yum install > libxslt-devel' > *** extconf.rb failed *** > Could not create Makefile due to some reason, probably lack of > necessary libraries and/or headers. ?Check the mkmf.log file for more > details. ?You may need configuration options. > > I downloaded the tarball and tried running "ruby extconf.rb > --with-xslt-dir=/home/myname/local" and an assortment of combinations like > that, but I can't seem to make it work. What's the secret? > Thanks. I assume that you've already installed libxml2 and libxslt? They're just in your local directory? Your extconf parameters look fine. Can you send me the mkmf.log *after* running with your custom params? -- Aaron Patterson http://tenderlovemaking.com/ From bobf at jhu.edu Thu Apr 2 15:57:09 2009 From: bobf at jhu.edu (Bob Fitterman) Date: Thu, 2 Apr 2009 13:57:09 -0600 Subject: [Nokogiri-talk] Advice on New Install to Local Directories In-Reply-To: <6959e1680904011352wa747b3cv12f10243bebfb3d9@mail.gmail.com> References: <88b3725b0904011251ia5a62f3lc86b39ccab938055@mail.gmail.com> <6959e1680904011352wa747b3cv12f10243bebfb3d9@mail.gmail.com> Message-ID: <88b3725b0904021257n3359974v496d42ef01c26120@mail.gmail.com> >From my this directory: ~/.gems/gems/nokogiri-1.2.3/ext/nokogiri If I do this: ruby extconf.rb --opt-dir=/home/myname/local/ I see this: checking for iconv.h in /usr/include,/opt/local/include,/usr/local/include,/usr/include... yes checking for libxml/parser.h in /opt/local/include/,/opt/local/include/libxml2,/usr/include/libxml2,/usr/include,/usr/local/include/libxml2,/usr/include/libxml2... yes checking for libxslt/xslt.h in /opt/local/include/,/opt/local/include/libxml2,/usr/include/libxml2,/usr/include,/usr/local/include/libxml2,/usr/include/libxml2... no libxslt is missing. try 'port install libxslt' or 'yum install libxslt-devel' *** extconf.rb failed *** I figured this would tell it to look in the right directories but you'll notice it's not searching through MY copies, it's looking out in /usr/local. Everything is installed in /home/myname/local. Specifically, libxslt/xslt.h can be found in /home/myname/local/include/libxslt . I'm sure I'm missing something really basic about how get it to build with the local install. I think once I get that point, the contents of mkmf.log will be relevant. Thanks. Bob -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.patterson at gmail.com Thu Apr 2 16:57:39 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Thu, 2 Apr 2009 13:57:39 -0700 Subject: [Nokogiri-talk] Advice on New Install to Local Directories In-Reply-To: <88b3725b0904021257n3359974v496d42ef01c26120@mail.gmail.com> References: <88b3725b0904011251ia5a62f3lc86b39ccab938055@mail.gmail.com> <6959e1680904011352wa747b3cv12f10243bebfb3d9@mail.gmail.com> <88b3725b0904021257n3359974v496d42ef01c26120@mail.gmail.com> Message-ID: <6959e1680904021357q2f55f748pc3d891a8d7d096d9@mail.gmail.com> 2009/4/2 Bob Fitterman : > >From my this directory: ~/.gems/gems/nokogiri-1.2.3/ext/nokogiri > If I do this: ruby extconf.rb --opt-dir=/home/myname/local/ Try something like this: ruby exconf.rb --with-xslt-dir=/home/myname/local You should see '/home/myname/local' show up in the searched directories. -- Aaron Patterson http://tenderlovemaking.com/ From bobf at jhu.edu Fri Apr 3 21:26:56 2009 From: bobf at jhu.edu (Bob Fitterman) Date: Fri, 3 Apr 2009 19:26:56 -0600 Subject: [Nokogiri-talk] Advice on New Install to Local Directories In-Reply-To: <6959e1680904021357q2f55f748pc3d891a8d7d096d9@mail.gmail.com> References: <88b3725b0904011251ia5a62f3lc86b39ccab938055@mail.gmail.com> <6959e1680904011352wa747b3cv12f10243bebfb3d9@mail.gmail.com> <88b3725b0904021257n3359974v496d42ef01c26120@mail.gmail.com> <6959e1680904021357q2f55f748pc3d891a8d7d096d9@mail.gmail.com> Message-ID: <88b3725b0904031826r448b5f21tf2505a9bcd51e7c7@mail.gmail.com> Aaron, thanks it's getting closer, but there are 2 problems. Getting there. The extconf and make work now, but the "make install" tries to install to /usr/lib where I don't have permission. I think all I need is some guidance on this last step and I'll be all set. Thanks for your patience with this. Bob -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.patterson at gmail.com Fri Apr 3 21:49:03 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Fri, 3 Apr 2009 18:49:03 -0700 Subject: [Nokogiri-talk] Advice on New Install to Local Directories In-Reply-To: <88b3725b0904031826r448b5f21tf2505a9bcd51e7c7@mail.gmail.com> References: <88b3725b0904011251ia5a62f3lc86b39ccab938055@mail.gmail.com> <6959e1680904011352wa747b3cv12f10243bebfb3d9@mail.gmail.com> <88b3725b0904021257n3359974v496d42ef01c26120@mail.gmail.com> <6959e1680904021357q2f55f748pc3d891a8d7d096d9@mail.gmail.com> <88b3725b0904031826r448b5f21tf2505a9bcd51e7c7@mail.gmail.com> Message-ID: <6959e1680904031849q44650ca9o2e623691e720cae8@mail.gmail.com> On Fri, Apr 3, 2009 at 6:26 PM, Bob Fitterman wrote: > Aaron, thanks it's getting closer, but there are 2 problems. > > Getting there. The extconf and make work now, but the "make install" tries > to install to /usr/lib where I don't have permission. I think all I need is > some guidance on this last step and I'll be all set. Thanks for your > patience with this. No problem. Don't use 'make install' to install the gem. The gem must be installed via the 'gem' command. You can give the extconf.rb parameters to the gem command, and it will use those while compiling, but install to your normal gem directory. The command goes like this: $ gem install nokogiri -- --with-libxslt-dir=/whatever/ All of the flags after the double dash will be supplied to extconf.rb. Hope that helps. -- Aaron Patterson http://tenderlovemaking.com/ From bobf at jhu.edu Fri Apr 3 23:05:23 2009 From: bobf at jhu.edu (Bob Fitterman) Date: Fri, 3 Apr 2009 21:05:23 -0600 Subject: [Nokogiri-talk] Advice on New Install to Local Directories In-Reply-To: <6959e1680904031849q44650ca9o2e623691e720cae8@mail.gmail.com> References: <88b3725b0904011251ia5a62f3lc86b39ccab938055@mail.gmail.com> <6959e1680904011352wa747b3cv12f10243bebfb3d9@mail.gmail.com> <88b3725b0904021257n3359974v496d42ef01c26120@mail.gmail.com> <6959e1680904021357q2f55f748pc3d891a8d7d096d9@mail.gmail.com> <88b3725b0904031826r448b5f21tf2505a9bcd51e7c7@mail.gmail.com> <6959e1680904031849q44650ca9o2e623691e720cae8@mail.gmail.com> Message-ID: <88b3725b0904032005t25f119dewa5e766d5c4913f8e@mail.gmail.com> BINGO! Thanks a million. Your package is a lifesaver for my sluggish server. Keep at it. Bob On Fri, Apr 3, 2009 at 7:49 PM, Aaron Patterson wrote: > On Fri, Apr 3, 2009 at 6:26 PM, Bob Fitterman wrote: > > Aaron, thanks it's getting closer, but there are 2 problems. > > > > Getting there. The extconf and make work now, but the "make install" > tries > > to install to /usr/lib where I don't have permission. I think all I need > is > > some guidance on this last step and I'll be all set. Thanks for your > > patience with this. > > No problem. Don't use 'make install' to install the gem. The gem > must be installed via the 'gem' command. > > You can give the extconf.rb parameters to the gem command, and it will > use those while compiling, but install to your normal gem directory. > > The command goes like this: > > $ gem install nokogiri -- --with-libxslt-dir=/whatever/ > > All of the flags after the double dash will be supplied to extconf.rb. > > Hope that helps. > > -- > Aaron Patterson > http://tenderlovemaking.com/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From phlip2005 at gmail.com Sat Apr 4 00:38:09 2009 From: phlip2005 at gmail.com (Phlip) Date: Fri, 03 Apr 2009 21:38:09 -0700 Subject: [Nokogiri-talk] [Ann] Verify, a very basic testing tool. In-Reply-To: <49D69885.6020105@comcast.net> References: <335e48a90904031302o52992ad1m611789186e964e13@mail.gmail.com> <335e48a90904031515q5f63b694p11e9ecfaa2484bf1@mail.gmail.com> <335e48a90904031545o62efa7c1kd22c0403b263fc58@mail.gmail.com> <335e48a90904031548h46ebd6e2sb9fc2b5999385b9c@mail.gmail.com> <49D69885.6020105@comcast.net> Message-ID: <49D6E431.3060802@gmail.com> Tom Cloyd wrote: > Damn. I love this sudden burst of creativity around the topic of > testing. A Good Thing. Oookay. Here's a sneak preview of assert{ 2.0 } 0.4.8. You know how Ajax works by generating JavaScript, and slinging it at your web browser? And you know how Rails purportedly tests it with "assert_rjs"? Here's a sample: assert_rjs :replace_html, "advanced_filter", "" Too cute, right? Wrong! That expands to nothing but a big Regexp, like /Element.update.*advanced_filter/. So a payload of "advanced_filter", or even a subsequent Element.update('advanced_filter'), could fool it. Further, at work we do a lot of in-house Ajax, so we are at liberty to render entire partials at whim. We require our teeming minions to use only Firefox. But all assert_rjs does with its third argument is drop it into assert_match. That is not powerful enough to constrain our apps! Just now while writing this post, I got Aaron Paterson's rKelly working in an assert_rjs clone. rKelly uses racc to parse and evaluate JavaScript. This matches our goal of _unit_ testing soft targets. Watir, Selenium, etc. are all great, and they introduced a generation to testing in general. Buuuuuut they work thru the browser. We are not inventing Ajax itself; we just need to accurately spot-check that our own data go into the correct slots in our JavaScript payloads. So here's a test that simulates a Rails functional test with xhr :get : @response = OpenStruct.new(:body => "Element.update(\"label_7\", \"I want a pet < than a chihuahua\");") assert_rjs :replace_html, :label_7 K, so far that looks like the original assert_rjs. But under the hood, it actually lexed the Element.update() call: ast.pointcut('Element.update()').matches.each do |updater| updater.grep(RKelly::Nodes::ArgumentsNode).each do |thang| div_id, html = thang.value if target and html div_id = eval(div_id.value) html = eval(html.value) if div_id == target.to_s assert_match matcher, html (Open question to Aaron - is that the best way to run the query?) The test actually determines we really got hold of the Element.update('label_7', ...). No other JavaScript line will match. Here's the assertions to match the text payload: assert_rjs :replace_html, :label_7, /Top_Ranking/ assert_rjs :replace_html, :label_7, /pet < than a chihuahua/ Ho hum; so far assert_rjs Classic could have done all that. But... Because I have an exact string, not a rough match, I can now treat it as pure HTML, and I can drop it into the mighty assert_xhtml()! Now the assertion looks like this: assert_rjs :replace_html, :label_7 do input.Top_Ranking! :type => :checked, :value => :Y input.cross_sale_1, :type => :hidden, :value => 7 end From here, no matter how complex that rendered partial, the assertion can keep up with it, and help make it safe to refactor and upgrade. BTW Verify and Testy can get on board if they A> import Test::Unit::Assertions (like certain other test rigs we could mention should), and B> implement flunk(). Both of those are all a custom assertion should ever need... -- Phlip http://www.zeroplayer.com/ From phlip2005 at gmail.com Sat Apr 4 08:59:29 2009 From: phlip2005 at gmail.com (Phlip) Date: Sat, 04 Apr 2009 05:59:29 -0700 Subject: [Nokogiri-talk] to_xhtml should not emit Message-ID: <49D759B1.80703@gmail.com> Nokogiri: My assert_xhtml failure message formerly used to_html: "\n\n...in this sample...\n\n" + sample.to_html That printed: ...in this sample... But 'checked' is too Last Millenium. So I upgrade to to_xhtml... ...and get the same thing. Ouch! use to_xml?? From aaron.patterson at gmail.com Sat Apr 4 17:26:32 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Sat, 4 Apr 2009 14:26:32 -0700 Subject: [Nokogiri-talk] to_xhtml should not emit In-Reply-To: <49D759B1.80703@gmail.com> References: <49D759B1.80703@gmail.com> Message-ID: <6959e1680904041426m70700c28v36eb91589d346c15@mail.gmail.com> On Sat, Apr 4, 2009 at 5:59 AM, Phlip wrote: > Nokogiri: > > My assert_xhtml failure message formerly used to_html: > > ? ? ? ? ? ? ? ? ? ? "\n\n...in this sample...\n\n" + > ? ? ? ? ? ? ? ? ? ? ? ? ?sample.to_html > > That printed: > > ...in this sample... > > value="Y"> > > But 'checked' is too Last Millenium. So I upgrade to to_xhtml... > > ...and get the same thing. Ouch! use to_xml?? What version of libxml2 are you using? -- Aaron Patterson http://tenderlovemaking.com/ From phlip2005 at gmail.com Sat Apr 4 23:09:49 2009 From: phlip2005 at gmail.com (Phlip) Date: Sat, 04 Apr 2009 20:09:49 -0700 Subject: [Nokogiri-talk] to_xhtml should not emit In-Reply-To: <6959e1680904041426m70700c28v36eb91589d346c15@mail.gmail.com> References: <49D759B1.80703@gmail.com> <6959e1680904041426m70700c28v36eb91589d346c15@mail.gmail.com> Message-ID: <49D820FD.1070905@gmail.com> > What version of libxml2 are you using? $ aptitude show libxml2 Package: libxml2 State: installed Automatically installed: no Version: 2.6.32.dfsg-5ubuntu3 And I suspect something inside it just screwed up my libxml-ruby. Who needs monkey patching when you can just sudo kate the source??! (-: From aaron.patterson at gmail.com Mon Apr 6 12:39:33 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Mon, 6 Apr 2009 09:39:33 -0700 Subject: [Nokogiri-talk] to_xhtml should not emit In-Reply-To: <49D820FD.1070905@gmail.com> References: <49D759B1.80703@gmail.com> <6959e1680904041426m70700c28v36eb91589d346c15@mail.gmail.com> <49D820FD.1070905@gmail.com> Message-ID: <6959e1680904060939l3ef6f6e2jf7cbfaa6b11616bd@mail.gmail.com> On Sat, Apr 4, 2009 at 8:09 PM, Phlip wrote: >> What version of libxml2 are you using? > > $ aptitude show libxml2 > Package: libxml2 > State: installed > Automatically installed: no > Version: 2.6.32.dfsg-5ubuntu3 > > And I suspect something inside it just screwed up my libxml-ruby. Would you mind giving it a whirl with libxml 2.7.3? IIRC, the XHTML functionality is busted in the 2.6.* series. 2.7.3 provides valid XHTML for me. -- Aaron Patterson http://tenderlovemaking.com/ From julien.genestoux at gmail.com Fri Apr 10 02:22:38 2009 From: julien.genestoux at gmail.com (Julien Genestoux) Date: Thu, 9 Apr 2009 23:22:38 -0700 Subject: [Nokogiri-talk] Comparing documents Message-ID: <26c0cf900904092322g352faed6x57baa7fab45fc7f4@mail.gmail.com> Hey, Is there an easy way to compare Nokogiri Documents? The idea is here that I am trying to build some XML with the builder, and, to make sure I am building correctly, I am parsing a xml chunk that should be the result and comparing it to what the builder did. I could compare the string versions of the doc, but then, I have errors with the slightest space difference, which is not relevant. Initially, I thought I could easily do it recursively by comparing nodes... but node comparison fails even if 2 nodes have the same name and attributes... : doc = Nokogiri::XML::Document.new => n = Nokogiri::XML::Element.new("salut", doc) => n["toto"] = "tata" => "tata" n => m = Nokogiri::XML::Element.new("salut", doc) => m["toto"] = "tata" => "tata" m == n => false Any help here? Thanks a bunch! Julien -- Julien Genestoux, Notifixio.us http://twitter.com/julien51 http://www.ouvre-boite.com http://blog.notifixio.us +1 (415) 254 7340 +33 (0)9 70 44 76 29 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.dalessio at gmail.com Fri Apr 10 07:56:56 2009 From: mike.dalessio at gmail.com (Mike Dalessio) Date: Fri, 10 Apr 2009 07:56:56 -0400 Subject: [Nokogiri-talk] Comparing documents In-Reply-To: <26c0cf900904092322g352faed6x57baa7fab45fc7f4@mail.gmail.com> References: <26c0cf900904092322g352faed6x57baa7fab45fc7f4@mail.gmail.com> Message-ID: <618c07250904100456s2492540bvd664ee64c10a31d8@mail.gmail.com> Julien - Check out Aaron's tree_diff. http://github.com/tenderlove/tree_diff/tree -m 2009/4/10 Julien Genestoux > Hey, > > Is there an easy way to compare Nokogiri Documents? > > The idea is here that I am trying to build some XML with the builder, and, > to make sure I am building correctly, I am parsing a xml chunk that should > be the result and comparing it to what the builder did. > I could compare the string versions of the doc, but then, I have errors > with the slightest space difference, which is not relevant. > > Initially, I thought I could easily do it recursively by comparing nodes... > but node comparison fails even if 2 nodes have the same name and > attributes... : > > doc = Nokogiri::XML::Document.new > => > n = Nokogiri::XML::Element.new("salut", doc) > => > n["toto"] = "tata" > => "tata" > n > => > m = Nokogiri::XML::Element.new("salut", doc) > => > m["toto"] = "tata" > => "tata" > m == n > => false > > Any help here? > > Thanks a bunch! > > Julien > > > > > -- > Julien Genestoux, Notifixio.us > http://twitter.com/julien51 > http://www.ouvre-boite.com > http://blog.notifixio.us > > +1 (415) 254 7340 > +33 (0)9 70 44 76 29 > > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > > -- mike dalessio mike at csa.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.patterson at gmail.com Fri Apr 10 11:46:29 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Fri, 10 Apr 2009 08:46:29 -0700 Subject: [Nokogiri-talk] Comparing documents In-Reply-To: <26c0cf900904092322g352faed6x57baa7fab45fc7f4@mail.gmail.com> References: <26c0cf900904092322g352faed6x57baa7fab45fc7f4@mail.gmail.com> Message-ID: <6959e1680904100846m32494887rdb275aa00647ce8c@mail.gmail.com> On Thu, Apr 9, 2009 at 11:22 PM, Julien Genestoux wrote: > Hey, > > Is there an easy way to compare Nokogiri Documents? > > The idea is here that I am trying to build some XML with the builder, and, > to make sure I am building correctly, I am parsing a xml chunk that should > be the result and comparing it to what the builder did. > I could compare the string versions of the doc, but then, I have errors with > the slightest space difference, which is not relevant. > > Initially, I thought I could easily do it recursively by comparing nodes... > but node comparison fails even if 2 nodes have the same name and > attributes... : > > doc = Nokogiri::XML::Document.new > => > n = Nokogiri::XML::Element.new("salut", doc) > => > n["toto"] = "tata" > => "tata" > n > => > m =? Nokogiri::XML::Element.new("salut", doc) > => > m["toto"] = "tata" > => "tata" > m == n > => false > > Any help here? As far as the XML document is concerned, no two nodes are ever equal. Every node in a document is different. Every node has many attributes to compare: 1. Is the name the same? 2. How about attributes? 3. How about the namespace? 4. What about number of children? 5. Are all the children the same? 6. Is it's parent node the same? 7. What about it's position relative to sibling nodes? Think about adding two nodes to the same document. They can *never* have the same position relative to sibling nodes, therefore two nodes in a document cannot be "equal". You *can* however compare two different documents. But you need to answer those 7 questions yourself as you're walking the two trees. Your requirements for sameness may differ from others. I wouldn't be opposed to implementing a =~ on Node that did this comparison, but was very strict about those questions. You could do stuff like: doc1 =~ doc2 # => true doc2 =~ doc3 # => false As long as it only returned true or false. How does that sound? -- Aaron Patterson http://tenderlovemaking.com/ From phlip2005 at gmail.com Fri Apr 10 14:43:51 2009 From: phlip2005 at gmail.com (Phlip) Date: Fri, 10 Apr 2009 11:43:51 -0700 Subject: [Nokogiri-talk] Comparing documents In-Reply-To: <6959e1680904100846m32494887rdb275aa00647ce8c@mail.gmail.com> References: <26c0cf900904092322g352faed6x57baa7fab45fc7f4@mail.gmail.com> <6959e1680904100846m32494887rdb275aa00647ce8c@mail.gmail.com> Message-ID: <49DF9367.5010600@gmail.com> Aaron Patterson wrote: > As far as the XML document is concerned, no two nodes are ever equal. > Every node in a document is different. Every node has many attributes > to compare: > > 1. Is the name the same? > 2. How about attributes? > 3. How about the namespace? > 4. What about number of children? > 5. Are all the children the same? > 6. Is it's parent node the same? > 7. What about it's position relative to sibling nodes? The algorithm inside assert_xhtml essentially detects when one XML is a subset of the other. Reducing its fuzziness would produce a more exact match detector... From phlip2005 at gmail.com Fri Apr 10 17:33:17 2009 From: phlip2005 at gmail.com (Phlip) Date: Fri, 10 Apr 2009 14:33:17 -0700 Subject: [Nokogiri-talk] Comparing documents In-Reply-To: <26c0cf900904101418h7237b8a2r4e42afd08e739abb@mail.gmail.com> References: <26c0cf900904092322g352faed6x57baa7fab45fc7f4@mail.gmail.com> <6959e1680904100846m32494887rdb275aa00647ce8c@mail.gmail.com> <26c0cf900904101418h7237b8a2r4e42afd08e739abb@mail.gmail.com> Message-ID: <49DFBB1D.6090808@gmail.com> Julien Genestoux wrote: > Thank you guys for the responses... > > It is actually for rspec purposes that I need to compare an expected doc > and the actual generated. I suspect assert_xhtml can do this. Install the gems nokogiri and assert2. Then require 'assert2/xhtml' > > Aaron, here is how I see the 7 questions : > 1. Is the name the same? yes, it should > 2. How about attributes? same > 3. How about the namespace? same > 4. What about number of children? same > 5. Are all the children the same? same > 6. Is it's parent node the same? same > 7. What about it's position relative to sibling nodes? nope! Then... @response.body.should be_html_with{ xml_tag :attribute1 => 'value1' do nested_tag :attribute2 => 'value2', :xpath! => '42 = count(*) and parent::xml_tag and position() = 0' end } Ask if that doesn't work - it's exactly what I invented assert_xhtml, with the slight matter I will drop your XML into Nokogiri::HTML. That may or may not be a problem! Naturally, this library needs a .be_xml_with{}, for completeness. From aaron.patterson at gmail.com Thu Apr 16 13:13:05 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Thu, 16 Apr 2009 10:13:05 -0700 Subject: [Nokogiri-talk] Help with XML Builder In-Reply-To: References: Message-ID: <6959e1680904161013j7b8fde72oc06f02e6c69e9658@mail.gmail.com> On Thu, Apr 16, 2009 at 4:11 AM, Antel wrote: > I've this hash: > @@node = {:nodes=>["main_question", "best_answer", "other_answer"], > :rootnodes=>["questions_main", "main_content", "main_q1", "main_q2", > "close_node", "best_a1", "close_node", "close_node", "main_answers", > "close_node", "close_node", "close_node"]} Is there any way you can merge those two arrays before building up your XML? Having the two separate arrays makes things difficult. If it was one array, you could do this: http://rafb.net/p/Fbzada27.html -- Aaron Patterson http://tenderlovemaking.com/ From byrnejb at harte-lyne.ca Thu Apr 16 16:49:02 2009 From: byrnejb at harte-lyne.ca (James B. Byrne) Date: Thu, 16 Apr 2009 16:49:02 -0400 (EDT) Subject: [Nokogiri-talk] Need basic xml help Message-ID: <57029.216.185.71.24.1239914942.squirrel@webmail.harte-lyne.ca> I am trying to parse an xml:RDF document. This is completely new for me and I am struggling. Being a cucumber user I was already aware of nokogiri and so I decided to push ahead using nokogiri first. My xml document is a central bank rss feed of exchange rates. It looks like this. Bank of Canada: Noon Foreign Exchange Rates ... ... en 2009-04-16 CA: 0.8290 USD = 1 CAD 2009-04-16 ... http://www.bankofcanada.ca/fx/daily_noon.html 1 Canadian Dollar = 0.8290 USD ... en 2009-04-16 text/html CA CAD USD 0.8290 noon statistics ... # ~ 50 of these entries All I have managed to do is load the document and then I am stuck. >> fx = Nokogiri::XML(open( 'http://www.bankofcanada.ca/rss/fx/noon/fx-noon-all.xml')) I can see the document if I call fx.to_xml but I cannot figure out how to navigate through it. The final result that I am trying to achieve is extract the exchange rate for a given currency on a given day and put it into an ActiveRecord model for loading into a database. So the bits I am interested in are: , , , , and . Presumably, once I discover how to reach any of the interior tags then I will be able to use the same technique to reach nay others. I am working in the console to get some sense of how this is supposed to work. However, fx.child or fx.children return nil. fx.content shows the entire file. So does fx.root. fx.next yields nil. What I need is a brief set of instructions on how to navigate this document using nokogiri. -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB at Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3 From aaron.patterson at gmail.com Thu Apr 16 17:10:24 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Thu, 16 Apr 2009 14:10:24 -0700 Subject: [Nokogiri-talk] Need basic xml help In-Reply-To: <57029.216.185.71.24.1239914942.squirrel@webmail.harte-lyne.ca> References: <57029.216.185.71.24.1239914942.squirrel@webmail.harte-lyne.ca> Message-ID: <6959e1680904161410u28917c66uc741688a9b896c3a@mail.gmail.com> On Thu, Apr 16, 2009 at 1:49 PM, James B. Byrne wrote: > I am trying to parse an xml:RDF document. ?This is completely new > for me and I am struggling. ?Being a cucumber user I was already > aware of nokogiri and so I decided to push ahead using nokogiri > first. > > My xml document is a central bank rss feed of exchange rates. It > looks like this. > > > xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > ... # many more namespaces declared > > Bank of Canada: Noon Foreign Exchange Rates > ... > ? > ? ? > ? ? ? ? > ? ? ? ?... > ? ? ? ? > ? > ? > ? ? en > ? ? 2009-04-16 > > > ? ? ? ?CA: 0.8290 USD = 1 CAD 2009-04-16 ... > ? ? ? ?http://www.bankofcanada.ca/fx/daily_noon.html > ? ? ? ?1 Canadian Dollar = 0.8290 USD ... > ? ? ? ?en > ? ? ? ?2009-04-16 > ? ? ? ?text/html > ? ? ? ?CA > > ? ? ? ?CAD > ? ? ? ?USD > ? ? ? ?0.8290 > ? ? ? ?noon > ? ? ? ?statistics > > > > ?... # ~ 50 of these entries > > > > > All I have managed to do is load the document and then I am stuck. > >>> fx = Nokogiri::XML(open( > ? ? ? 'http://www.bankofcanada.ca/rss/fx/noon/fx-noon-all.xml')) > > I can see the document if I call fx.to_xml but I cannot figure out > how to navigate through it. ?The final result that I am trying to > achieve is extract the exchange rate for a given currency on a given > day and put it into an ActiveRecord model for loading into a > database. > > So the bits I am interested in are: , , > , , and . ?Presumably, > once I discover how to reach any of the interior tags then I will be > able to use the same technique to reach nay others. > > I am working in the console to get some sense of how this is > supposed to work. ?However, fx.child or fx.children return nil. > fx.content shows the entire file. So does fx.root. ?fx.next yields > nil. fx is pointing at the document. Try something like 'fx.root.children.length' and see what that returns. > What I need is a brief set of instructions on how to navigate this > document using nokogiri. If you're familiar with CSS or XPath, it should be pretty easy to navigate. Take a look at the synopsis in the README: http://nokogiri.rubyforge.org/nokogiri/ Or also the xpath or css method documentation: http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Node.html Also, try something like this: fx.css('item').each { |item| p item.children } -- Aaron Patterson http://tenderlovemaking.com/ From aaron.patterson at gmail.com Thu Apr 16 17:18:12 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Thu, 16 Apr 2009 14:18:12 -0700 Subject: [Nokogiri-talk] How to convert this XML parsing code to use a stream? In-Reply-To: References: Message-ID: <6959e1680904161418h77fa4ebbu74e4e0c082b45900@mail.gmail.com> On Thu, Apr 16, 2009 at 2:03 PM, Jed Hurt wrote: > I am trying to parse the Netflix catalog index XML file to cherry pick the > movies that are available to watch instantly[1][2]. The XML file provided by > Netflix contains every movie in Netflix's catalog and is huge (289MB). I > wrote some Nokogiri code to find all movies available instantly and ran it > against a small sample XML file to test it.?It works well. > The problem is that when I run the code against the actual file, my computer > grinds to a crashing halt and ruby starts throwing malloc errors. Here's is > the code and the sample XML file? > Code:?http://pastie.textmate.org/445794 > Sample XML:?http://pastie.textmate.org/445796 > Would someone would be willing to convert the code to use streaming > (one? at a time) to mitigate the memory issues? Converting your code is non-trivial work. Try starting with Nokogiri::XML::SAX::Parser. Basically, you'll need to create a class that inherits from Nokogiri::XML::SAX::Document and pass that off to the SAX parser. Then give the SAX parser your IO stream and it will call callbacks on your document class. Take a look at the tests for an example: http://github.com/tenderlove/nokogiri/blob/99abb9fb042238004b779317757c8480ed2f143d/test/xml/sax/test_parser.rb Paul Dix makes heavy use of the SAX parser too, so you might want to check out his project: http://github.com/pauldix/feedzirra/tree/master -- Aaron Patterson http://tenderlovemaking.com/ From byrnejb at harte-lyne.ca Thu Apr 16 20:35:14 2009 From: byrnejb at harte-lyne.ca (James B. Byrne) Date: Thu, 16 Apr 2009 20:35:14 -0400 (EDT) Subject: [Nokogiri-talk] Need basic xml help In-Reply-To: <6959e1680904161410u28917c66uc741688a9b896c3a@mail.gmail.com> References: <57029.216.185.71.24.1239914942.squirrel@webmail.harte-lyne.ca> <6959e1680904161410u28917c66uc741688a9b896c3a@mail.gmail.com> Message-ID: <61918.69.157.29.96.1239928514.squirrel@webmail.harte-lyne.ca> On Thu, April 16, 2009 17:10, Aaron Patterson wrote: > Also, try something like this: > > fx.css('item').each { |item| p item.children } > Thank you. I have already found and read the references that you give. What I am trying to get an example of is the type of construction that would do this: fx = = Nokogiri::XML(open( 'http://www.bankofcanada.ca/rss/fx/noon/fx-noon-all.xml')) fx.xpath(???).each do |xchg| cc = CurrencyExChg.new cc.currency_base = xchg.?? cc.currency_target = xchg.?? cc.currency_xchg_rate = xchg.?? cc.currency_xchg_date = xchg.?? cc.save! end To begin with, I can get nowhere using xpath on the document I am working with. >> fx.xpath('//item').each do ?> cnt = cnt -1 >> puts cnt >> end => 0 >> ?> fx.css('item').each do ?> cnt = cnt - 1 >> puts cnt >> end -1 -2 -3 ... Now, my problem is that even if I use css and an iterator construct like this: fx.css('item').each do |rate| I do not know how I get the data elements I desire out of the rate variable. I cannot even grasp what the rate object is or what it contains. In my head I imagined that one of the xml parsers available for Ruby would take an xml doc and turn it into a nested hash or array, so that for the example I gave in my previous message I would obtain something akin to: fx = item[ {:date => '2009-04-16', :base_currency => 'CAD', :target_currency => 'USD', :value => '0.8290', ...}, {:date => '2009-04-16', :base_currency => 'CAD', :target_currency => 'GBP', :value => '1.6354', ...}, ... ] which I could at least examine in console and could map to the setter attributes of an ActiveRecord class in a fairly straight forward manner. As it is I cannot seem to find any methods that display the data structure I am working with in a manner that I can extract the relevant parts. I have gone through three or four tutorials now, including a good one at http://www.zvon.org/xxl/XPathTutorial/ which explains the xpath hierarchy very well. Unfortuanately, with nokogiri all I can accomplish with fx.xpath is to dump the entire document. It does not seem to matter what I provide as arguments. ?> fx.xpath('//*item').each do ?> cnt = cnt + 1 >> puts cnt >> end Nokogiri::XML::XPath::SyntaxError: Invalid expression ?> fx.xpath('//item').each do ?> cnt = cnt + 1 >> puts cnt >> end => 0 ?> fx.xpath('item').each do ?> cnt = cnt + 1 >> puts cnt >> end => 0 Now, I realize that this is due to ignorance on my part. But I really cannot figure out how to obtain what I desire from the abbreviated examples that I can find. I need an example of how to pull successive sets of information out of that xml document. I know that it is possible. I believe that Ruby and nokogiri can do it. I just need instruction on how it is done. -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB at Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3 From phlip2005 at gmail.com Fri Apr 17 00:47:55 2009 From: phlip2005 at gmail.com (Phlip) Date: Thu, 16 Apr 2009 21:47:55 -0700 Subject: [Nokogiri-talk] Need basic xml help In-Reply-To: <61918.69.157.29.96.1239928514.squirrel@webmail.harte-lyne.ca> References: <57029.216.185.71.24.1239914942.squirrel@webmail.harte-lyne.ca> <6959e1680904161410u28917c66uc741688a9b896c3a@mail.gmail.com> <61918.69.157.29.96.1239928514.squirrel@webmail.harte-lyne.ca> Message-ID: <49E809FB.4050908@gmail.com> James B. Byrne wrote: > What I am trying to get an example of is the type of construction > that would do this: Here's how far I got at first crack: fx = Nokogiri::XML(File.read('fx-noon-all.xml')) fx.xpath('/rdf:RDF/*').each do |xchg| p xchg.name p xchg.path end I don't want to use /* - I want to use /item. But it does not work, and the above returns this: "channel" "/rdf:RDF/*[1]" "item" "/rdf:RDF/*[2]" "item" "/rdf:RDF/*[3]" "item" ... So it seems there's an issue with the namespace. (Firefox XPath Checker and XPather both broke on that XML - dunno why, but different XML implementation.) So this hack tends to work: fx.xpath('/rdf:RDF/*').each do |xchg| if xchg.name == 'item' p xchg.xpath('cb:baseCurrency').text p xchg.xpath('cb:targetCurrency').text p xchg.xpath('cb:value').text p xchg.xpath('dc:date').text end end > Now, I realize that this is due to ignorance on my part. Though I have never used XPath with namespaces (I mostly attack XHTML with it), I have written enough XPaths to suspect a bug in libxml2 here... -- Phlip http://flea.sourceforge.net/resume.html From byrnejb at harte-lyne.ca Fri Apr 17 10:21:44 2009 From: byrnejb at harte-lyne.ca (James B. Byrne) Date: Fri, 17 Apr 2009 10:21:44 -0400 (EDT) Subject: [Nokogiri-talk] Need basic xml help In-Reply-To: <49E809FB.4050908@gmail.com> References: <57029.216.185.71.24.1239914942.squirrel@webmail.harte-lyne.ca> <6959e1680904161410u28917c66uc741688a9b896c3a@mail.gmail.com> <61918.69.157.29.96.1239928514.squirrel@webmail.harte-lyne.ca> <49E809FB.4050908@gmail.com> Message-ID: <38993.216.185.71.24.1239978104.squirrel@webmail.harte-lyne.ca> On Fri, April 17, 2009 00:47, Phlip wrote: > > So this hack tends to work: > > fx.xpath('/rdf:RDF/*').each do |xchg| > if xchg.name == 'item' > p xchg.xpath('cb:baseCurrency').text > p xchg.xpath('cb:targetCurrency').text > p xchg.xpath('cb:value').text > p xchg.xpath('dc:date').text > end > end > Thank you ever so much. This is exactly the type of example that I was looking for and either did not recognize or could not find. re: libxml2 - this is what I have: libxml2.x86_64 2.6.26-2.1.2.7 el5 (CentOS-5.3) Again , thanks. -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB at Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3 From aaron.patterson at gmail.com Fri Apr 17 13:11:44 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Fri, 17 Apr 2009 10:11:44 -0700 Subject: [Nokogiri-talk] bug tracker is changing Message-ID: <6959e1680904171011l56b4337eh1d3819de4b1cb667@mail.gmail.com> Hi everyone! I am changing the bug tracker to use github issues. Please do not file anymore tickets in lighthouse. The new ticket tracker is here: http://github.com/tenderlove/nokogiri/issues I will attempt to resolve the rest of the issues on lighthouse, but from this point forward, please use the github issue tracker. Again, the url is: http://github.com/tenderlove/nokogiri/issues Thanks everyone! -- Aaron Patterson http://tenderlovemaking.com/ From adam.vandenhoven at gmail.com Fri Apr 17 19:01:08 2009 From: adam.vandenhoven at gmail.com (Adam van den Hoven) Date: Fri, 17 Apr 2009 16:01:08 -0700 Subject: [Nokogiri-talk] Memory problem with Nokogiri and rack Message-ID: <1240009268.6904.161.camel@vandenhoven> Hey all, I'm building a little sinatra app that uses nokogiri to load a third party site, make a few changes and then send out the result (its a simplistic iPhone simulator). When I run it from my laptop (running ubuntu), I get the following error: *** glibc detected *** /usr/bin/ruby1.8: double free or corruption (!prev): 0x087e05c0 *** ======= Backtrace: ========= /lib/tls/i686/cmov/libc.so.6[0xb7ca6454] /lib/tls/i686/cmov/libc.so.6(cfree+0x96)[0xb7ca84b6] /usr/lib/libruby1.8.so.1.8(ruby_xfree+0x37)[0xb7e570a7] /usr/lib/libxml2.so.2(xmlFreeNodeList+0x20e)[0xb762face] /usr/lib/libxml2.so.2(xmlFreeProp+0x5a)[0xb762ffea] /usr/lib/libxml2.so.2(xmlFreePropList+0x1b)[0xb76302bb] /usr/lib/libxml2.so.2(xmlFreeNodeList+0xa2)[0xb762f962] /usr/lib/libxml2.so.2(xmlFreeNodeList+0x80)[0xb762f940] /usr/lib/libxml2.so.2(xmlFreeNodeList+0x80)[0xb762f940] /usr/lib/libxml2.so.2(xmlFreeNodeList+0x80)[0xb762f940] /usr/lib/libxml2.so.2(xmlFreeNodeList+0x80)[0xb762f940] /usr/lib/libxml2.so.2(xmlFreeNodeList+0x80)[0xb762f940] /usr/lib/libxml2.so.2(xmlFreeDoc+0xbc)[0xb762f77c] /usr/lib/ruby/gems/1.8/gems/nokogiri-1.2.3/lib/nokogiri/native.so[0xb7ae80c9] /usr/lib/libruby1.8.so.1.8(rb_gc_call_finalizer_at_exit +0xa7)[0xb7e573a7] /usr/lib/libruby1.8.so.1.8[0xb7e3bbb7] /usr/lib/libruby1.8.so.1.8(ruby_cleanup+0x1a2)[0xb7e48a52] /usr/lib/libruby1.8.so.1.8(ruby_stop+0x1d)[0xb7e48b7d] /usr/lib/libruby1.8.so.1.8[0xb7e50021] /usr/bin/ruby1.8[0x804870d] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7c4d685] /usr/bin/ruby1.8[0x8048621] Now I'm not sure that's a very good thing and I'm not sure I should put it up on Dreamhost in my dev location unless someone has an idea of what the problem is. Any thoughts? -- Adam van den Hoven Hybrid Web Developer Little Fyr Media p: 604.618.0845 e: adam.vandenhoven at gmail.com w: http://www.littlefyr.com From aaron.patterson at gmail.com Fri Apr 17 19:58:55 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Fri, 17 Apr 2009 16:58:55 -0700 Subject: [Nokogiri-talk] Memory problem with Nokogiri and rack In-Reply-To: <1240009268.6904.161.camel@vandenhoven> References: <1240009268.6904.161.camel@vandenhoven> Message-ID: <6959e1680904171658o540c2f8xc3261b196a7fb71b@mail.gmail.com> On Fri, Apr 17, 2009 at 4:01 PM, Adam van den Hoven wrote: > Hey all, > > I'm building a little sinatra app that uses nokogiri to load a third > party site, make a few changes and then send out the result (its a > simplistic iPhone simulator). When I run it from my laptop (running > ubuntu), I get the following error: > > ? ? ? ?*** glibc detected *** /usr/bin/ruby1.8: double free or > ? ? ? ?corruption (!prev): 0x087e05c0 *** > ? ? ? ?======= Backtrace: ========= > ? ? ? ?/lib/tls/i686/cmov/libc.so.6[0xb7ca6454] > ? ? ? ?/lib/tls/i686/cmov/libc.so.6(cfree+0x96)[0xb7ca84b6] > ? ? ? ?/usr/lib/libruby1.8.so.1.8(ruby_xfree+0x37)[0xb7e570a7] > ? ? ? ?/usr/lib/libxml2.so.2(xmlFreeNodeList+0x20e)[0xb762face] > ? ? ? ?/usr/lib/libxml2.so.2(xmlFreeProp+0x5a)[0xb762ffea] > ? ? ? ?/usr/lib/libxml2.so.2(xmlFreePropList+0x1b)[0xb76302bb] > ? ? ? ?/usr/lib/libxml2.so.2(xmlFreeNodeList+0xa2)[0xb762f962] > ? ? ? ?/usr/lib/libxml2.so.2(xmlFreeNodeList+0x80)[0xb762f940] > ? ? ? ?/usr/lib/libxml2.so.2(xmlFreeNodeList+0x80)[0xb762f940] > ? ? ? ?/usr/lib/libxml2.so.2(xmlFreeNodeList+0x80)[0xb762f940] > ? ? ? ?/usr/lib/libxml2.so.2(xmlFreeNodeList+0x80)[0xb762f940] > ? ? ? ?/usr/lib/libxml2.so.2(xmlFreeNodeList+0x80)[0xb762f940] > ? ? ? ?/usr/lib/libxml2.so.2(xmlFreeDoc+0xbc)[0xb762f77c] > ? ? ? ?/usr/lib/ruby/gems/1.8/gems/nokogiri-1.2.3/lib/nokogiri/native.so[0xb7ae80c9] > ? ? ? ?/usr/lib/libruby1.8.so.1.8(rb_gc_call_finalizer_at_exit > ? ? ? ?+0xa7)[0xb7e573a7] > ? ? ? ?/usr/lib/libruby1.8.so.1.8[0xb7e3bbb7] > ? ? ? ?/usr/lib/libruby1.8.so.1.8(ruby_cleanup+0x1a2)[0xb7e48a52] > ? ? ? ?/usr/lib/libruby1.8.so.1.8(ruby_stop+0x1d)[0xb7e48b7d] > ? ? ? ?/usr/lib/libruby1.8.so.1.8[0xb7e50021] > ? ? ? ?/usr/bin/ruby1.8[0x804870d] > ? ? ? ?/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7c4d685] > ? ? ? ?/usr/bin/ruby1.8[0x8048621] > > Now I'm not sure that's a very good thing and I'm not sure I should put > it up on Dreamhost in my dev location unless someone has an idea of what > the problem is. > > Any thoughts? Well, it's either a bug in nokogiri or a bug in libxml2. What version of libxml2 are you using? -- Aaron Patterson http://tenderlovemaking.com/ From adam.vandenhoven at gmail.com Mon Apr 20 02:27:35 2009 From: adam.vandenhoven at gmail.com (Adam van den Hoven) Date: Sun, 19 Apr 2009 23:27:35 -0700 Subject: [Nokogiri-talk] Memory problem with Nokogiri and rack In-Reply-To: <6959e1680904171658o540c2f8xc3261b196a7fb71b@mail.gmail.com> References: <1240009268.6904.161.camel@vandenhoven> <6959e1680904171658o540c2f8xc3261b196a7fb71b@mail.gmail.com> Message-ID: <1240208855.7836.11.camel@vandenhoven> synaptic says: 2.6.32.dfsg-4ubuntu1.1 -- Adam van den Hoven Hybrid Web Developer Little Fyr Media p: 604.618.0845 e: adam.vandenhoven at gmail.com w: http://www.littlefyr.com On Fri, 2009-04-17 at 16:58 -0700, Aaron Patterson wrote: > On Fri, Apr 17, 2009 at 4:01 PM, Adam van den Hoven > wrote: > > Hey all, > > > > I'm building a little sinatra app that uses nokogiri to load a third > > party site, make a few changes and then send out the result (its a > > simplistic iPhone simulator). When I run it from my laptop (running > > ubuntu), I get the following error: > > > > *** glibc detected *** /usr/bin/ruby1.8: double free or > > corruption (!prev): 0x087e05c0 *** > > ======= Backtrace: ========= > > /lib/tls/i686/cmov/libc.so.6[0xb7ca6454] > > /lib/tls/i686/cmov/libc.so.6(cfree+0x96)[0xb7ca84b6] > > /usr/lib/libruby1.8.so.1.8(ruby_xfree+0x37)[0xb7e570a7] > > /usr/lib/libxml2.so.2(xmlFreeNodeList+0x20e)[0xb762face] > > /usr/lib/libxml2.so.2(xmlFreeProp+0x5a)[0xb762ffea] > > /usr/lib/libxml2.so.2(xmlFreePropList+0x1b)[0xb76302bb] > > /usr/lib/libxml2.so.2(xmlFreeNodeList+0xa2)[0xb762f962] > > /usr/lib/libxml2.so.2(xmlFreeNodeList+0x80)[0xb762f940] > > /usr/lib/libxml2.so.2(xmlFreeNodeList+0x80)[0xb762f940] > > /usr/lib/libxml2.so.2(xmlFreeNodeList+0x80)[0xb762f940] > > /usr/lib/libxml2.so.2(xmlFreeNodeList+0x80)[0xb762f940] > > /usr/lib/libxml2.so.2(xmlFreeNodeList+0x80)[0xb762f940] > > /usr/lib/libxml2.so.2(xmlFreeDoc+0xbc)[0xb762f77c] > > /usr/lib/ruby/gems/1.8/gems/nokogiri-1.2.3/lib/nokogiri/native.so[0xb7ae80c9] > > /usr/lib/libruby1.8.so.1.8(rb_gc_call_finalizer_at_exit > > +0xa7)[0xb7e573a7] > > /usr/lib/libruby1.8.so.1.8[0xb7e3bbb7] > > /usr/lib/libruby1.8.so.1.8(ruby_cleanup+0x1a2)[0xb7e48a52] > > /usr/lib/libruby1.8.so.1.8(ruby_stop+0x1d)[0xb7e48b7d] > > /usr/lib/libruby1.8.so.1.8[0xb7e50021] > > /usr/bin/ruby1.8[0x804870d] > > /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7c4d685] > > /usr/bin/ruby1.8[0x8048621] > > > > Now I'm not sure that's a very good thing and I'm not sure I should put > > it up on Dreamhost in my dev location unless someone has an idea of what > > the problem is. > > > > Any thoughts? > > Well, it's either a bug in nokogiri or a bug in libxml2. What version > of libxml2 are you using? > > From byrnejb at harte-lyne.ca Mon Apr 20 16:59:13 2009 From: byrnejb at harte-lyne.ca (James B. Byrne) Date: Mon, 20 Apr 2009 16:59:13 -0400 (EDT) Subject: [Nokogiri-talk] Ruby Rails and Nokogiri Message-ID: <58868.216.185.71.24.1240261153.squirrel@webmail.harte-lyne.ca> I am trying to use nokogiri in a class that pulls and parses an xml feed. It is intended that the class be used in a stand-alone Ruby process run via cron. If I use the class in script/console then everything works fine. If instead I put it into a stand alone script then I see this error: $ bin/hll_forex_ca_feed.rb "--- !ruby/object:ForexCASource \nforex: !ruby/object:Nokogiri::XML::Document \n decorators: \n errors: []\n\n" The script is just this: #!/usr/bin/env ruby require File.dirname(__FILE__) + '/../config/environment' require 'rubygems' require 'active_record' require 'forex_ca_source' fx = ForexCASource.new ... The class looks like this: require 'nokogiri' require 'open-uri' class ForexCASource include Nokogiri::XML FOREX_SITE = \ 'http://www.bankofcanada.ca/rss/fx/noon/fx-noon-all.xml' def initialize(source=nil) return xchg_source unless source return xchg_source(source) end def xchg_source(source=FOREX_SITE) @forex = Nokogiri::XML(open(source)) rescue Exception => e Rails::logger.error( "ForexCASource unable to open #{source} \n #{e}") raise e end ... end Any idea of what I am doing wrong in setting up the environment for nokogiri that script/console takes care of? -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB at Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3 From aaron.patterson at gmail.com Mon Apr 20 17:26:06 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Mon, 20 Apr 2009 14:26:06 -0700 Subject: [Nokogiri-talk] Memory problem with Nokogiri and rack In-Reply-To: <1240208855.7836.11.camel@vandenhoven> References: <1240009268.6904.161.camel@vandenhoven> <6959e1680904171658o540c2f8xc3261b196a7fb71b@mail.gmail.com> <1240208855.7836.11.camel@vandenhoven> Message-ID: <6959e1680904201426mfe1e0d6v9ada9806719e574d@mail.gmail.com> Hi Adam, On Sun, Apr 19, 2009 at 11:27 PM, Adam van den Hoven wrote: > synaptic says: > > 2.6.32.dfsg-4ubuntu1.1 mailman blocked your last email because the attachment was too big. I downloaded the app and tried it out, but I can't seem to reproduce the problem. Does it happen every request, or just sometimes? Is the document you parse the same every time? Are you able to reproduce the bug in a stand alone ruby script? -- Aaron Patterson http://tenderlovemaking.com/ From byrnejb at harte-lyne.ca Mon Apr 20 17:43:30 2009 From: byrnejb at harte-lyne.ca (James B. Byrne) Date: Mon, 20 Apr 2009 17:43:30 -0400 (EDT) Subject: [Nokogiri-talk] [SOLVED] Re: Ruby Rails and Nokogiri In-Reply-To: <58868.216.185.71.24.1240261153.squirrel@webmail.harte-lyne.ca> References: <58868.216.185.71.24.1240261153.squirrel@webmail.harte-lyne.ca> Message-ID: <46729.216.185.71.24.1240263810.squirrel@webmail.harte-lyne.ca> Found my mistake. -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB at Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3 From adam.vandenhoven at gmail.com Mon Apr 20 17:47:34 2009 From: adam.vandenhoven at gmail.com (Adam van den Hoven) Date: Mon, 20 Apr 2009 14:47:34 -0700 Subject: [Nokogiri-talk] Memory problem with Nokogiri and rack In-Reply-To: <6959e1680904201426mfe1e0d6v9ada9806719e574d@mail.gmail.com> References: <1240009268.6904.161.camel@vandenhoven> <6959e1680904171658o540c2f8xc3261b196a7fb71b@mail.gmail.com> <1240208855.7836.11.camel@vandenhoven> <6959e1680904201426mfe1e0d6v9ada9806719e574d@mail.gmail.com> Message-ID: <1240264054.7108.149.camel@vandenhoven> Aaron, I didn't intend to send it to the list but to Mike Dalessio directly because it was too big. For me, the problem arises when I request it a few times and rack terminates. However if i quit rackup (^C) I see the error I reported. -- Adam van den Hoven Hybrid Web Developer Little Fyr Media p: 604.618.0845 e: adam.vandenhoven at gmail.com w: http://www.littlefyr.com On Mon, 2009-04-20 at 14:26 -0700, Aaron Patterson wrote: > Hi Adam, > > On Sun, Apr 19, 2009 at 11:27 PM, Adam van den Hoven > wrote: > > synaptic says: > > > > 2.6.32.dfsg-4ubuntu1.1 > > mailman blocked your last email because the attachment was too big. > > I downloaded the app and tried it out, but I can't seem to reproduce > the problem. > > Does it happen every request, or just sometimes? > > Is the document you parse the same every time? > > Are you able to reproduce the bug in a stand alone ruby script? > From aaron.patterson at gmail.com Mon Apr 20 20:45:26 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Mon, 20 Apr 2009 17:45:26 -0700 Subject: [Nokogiri-talk] Memory problem with Nokogiri and rack In-Reply-To: <1240264054.7108.149.camel@vandenhoven> References: <1240009268.6904.161.camel@vandenhoven> <6959e1680904171658o540c2f8xc3261b196a7fb71b@mail.gmail.com> <1240208855.7836.11.camel@vandenhoven> <6959e1680904201426mfe1e0d6v9ada9806719e574d@mail.gmail.com> <1240264054.7108.149.camel@vandenhoven> Message-ID: <6959e1680904201745h6cd70e4dlf0874cc6d4464a91@mail.gmail.com> On Mon, Apr 20, 2009 at 2:47 PM, Adam van den Hoven wrote: > Aaron, > > I didn't intend to send it to the list but to Mike Dalessio directly > because it was too big. Cool, no problem. I was just letting you know. > For me, the problem arises when I request it a few times and rack > terminates. However if i quit rackup (^C) I see the error I reported. Do you know if it's parsing the same document every single time? -- Aaron Patterson http://tenderlovemaking.com/ From adam.vandenhoven at gmail.com Tue Apr 21 10:55:03 2009 From: adam.vandenhoven at gmail.com (Adam van den Hoven) Date: Tue, 21 Apr 2009 07:55:03 -0700 Subject: [Nokogiri-talk] Memory problem with Nokogiri and rack In-Reply-To: <6959e1680904201745h6cd70e4dlf0874cc6d4464a91@mail.gmail.com> References: <1240009268.6904.161.camel@vandenhoven> <6959e1680904171658o540c2f8xc3261b196a7fb71b@mail.gmail.com> <1240208855.7836.11.camel@vandenhoven> <6959e1680904201426mfe1e0d6v9ada9806719e574d@mail.gmail.com> <1240264054.7108.149.camel@vandenhoven> <6959e1680904201745h6cd70e4dlf0874cc6d4464a91@mail.gmail.com> Message-ID: <1240325703.7108.1179.camel@vandenhoven> The example I sent is doing the same every time (the page itself is pretty static). I don't know what other pages might have the problem since it seems to be tied to some code I wrote to fix crappy and weird content. -- Adam van den Hoven Hybrid Web Developer Little Fyr Media p: 604.618.0845 e: adam.vandenhoven at gmail.com w: http://www.littlefyr.com On Mon, 2009-04-20 at 17:45 -0700, Aaron Patterson wrote: > On Mon, Apr 20, 2009 at 2:47 PM, Adam van den Hoven > wrote: > > Aaron, > > > > I didn't intend to send it to the list but to Mike Dalessio directly > > because it was too big. > > Cool, no problem. I was just letting you know. > > > For me, the problem arises when I request it a few times and rack > > terminates. However if i quit rackup (^C) I see the error I reported. > > Do you know if it's parsing the same document every single time? > From adam.vandenhoven at gmail.com Wed Apr 22 02:12:55 2009 From: adam.vandenhoven at gmail.com (Adam van den Hoven) Date: Tue, 21 Apr 2009 23:12:55 -0700 Subject: [Nokogiri-talk] Memory problem with Nokogiri and rack In-Reply-To: <618c07250904212230h69fc63fay96cb407ba8277b08@mail.gmail.com> References: <1240009268.6904.161.camel@vandenhoven> <6959e1680904171658o540c2f8xc3261b196a7fb71b@mail.gmail.com> <1240208855.7836.11.camel@vandenhoven> <6959e1680904201426mfe1e0d6v9ada9806719e574d@mail.gmail.com> <1240264054.7108.149.camel@vandenhoven> <6959e1680904201745h6cd70e4dlf0874cc6d4464a91@mail.gmail.com> <1240325703.7108.1179.camel@vandenhoven> <618c07250904212207n53af99b9y7edeb8e2cf08135f@mail.gmail.com> <618c07250904212230h69fc63fay96cb407ba8277b08@mail.gmail.com> Message-ID: <1240380775.7108.1592.camel@vandenhoven> Its nice to know that I wasn't totally insane. Its an interesting question, however, as to what the right behaviour is when trying to add a node that already exists in a document to some other location in the document. There are 4 possible things one could do in this situation: 1) Clone the node 2) Unlink the node 3) Throw an exception 4) Silently do nothing Now the first and last ones seem like bad ideas (yes they're straw men). I lean toward the third based on the idea that a method should do only one thing. Right now, it is doing two in a way that it is not clear by the method name. Having said that, a set of "move_" methods might be in order such that head.add_next_sibling( body.unlink ) is the same as body.move_after( head ) I would then have move_after, move_before, move_to_start_of, move_to_end_of But that's just me. Thanks for the help! -- Adam van den Hoven Hybrid Web Developer Little Fyr Media p: 604.618.0845 e: adam.vandenhoven at gmail.com w: http://www.littlefyr.com On Wed, 2009-04-22 at 01:30 -0400, Mike Dalessio wrote: > And, one last note, that issue is now fixed in master, and will be > included in the next Nokogiri release. > > > On Wed, Apr 22, 2009 at 1:07 AM, Mike Dalessio wrote: > Hi Adam, > > You can prevent this crash from occurring by changing this > line: > > head.add_next_sibling( body.unlink ) > > to this: > > head.add_next_sibling( body ) > > That is, remove the unlink() call. Node#add_next_sibling does > this implicitly for you. > > That said, Nokogiri certainly should be handling this more > gracefully than it is. I've opened a ticket on github issues > for it: > > http://github.com/tenderlove/nokogiri/issues#issue/22 > > -mike > > > > > On Tue, Apr 21, 2009 at 10:55 AM, Adam van den Hoven > wrote: > The example I sent is doing the same every time (the > page itself is > pretty static). I don't know what other pages might > have the problem > since it seems to be tied to some code I wrote to fix > crappy and weird > content. > -- > Adam van den Hoven > Hybrid Web Developer > Little Fyr Media > p: 604.618.0845 > e: adam.vandenhoven at gmail.com > w: http://www.littlefyr.com > > > > > > On Mon, 2009-04-20 at 17:45 -0700, Aaron Patterson > wrote: > > On Mon, Apr 20, 2009 at 2:47 PM, Adam van den Hoven > > wrote: > > > Aaron, > > > > > > I didn't intend to send it to the list but to Mike > Dalessio directly > > > because it was too big. > > > > Cool, no problem. I was just letting you know. > > > > > For me, the problem arises when I request it a few > times and rack > > > terminates. However if i quit rackup (^C) I see > the error I reported. > > > > Do you know if it's parsing the same document every > single time? > > > > > > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > > > > > > > -- > mike dalessio > mike at csa.net > > > > > -- > mike dalessio > mike at csa.net > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk From rubikitch at ruby-lang.org Sat Apr 25 14:55:53 2009 From: rubikitch at ruby-lang.org (rubikitch at ruby-lang.org) Date: Sun, 26 Apr 2009 03:55:53 +0900 (JST) Subject: [Nokogiri-talk] Encoding bug Message-ID: <20090426.035553.93256266.rubikitch@ruby-lang.org> Hi, There is an encoding bug in Ruby 1.9. # -*- coding: euc-jp -*- require 'rubygems' require 'nokogiri' Nokogiri::VERSION # => "1.2.3" euc = "?????" nokogiri = Nokogiri(euc, nil, "EUC-JP") x = nokogiri.at(:b).inner_text # => "\xE3\x81\x82\xE3\x81\x84\xE3\x81\x86\xE3\x81\x88\xE3\x81\x8A" x.encoding # => # require 'kconv' Kconv.guess(x) # => # x.force_encoding("UTF-8").encode("EUC-JP") # => "?????" -- rubikitch Blog: http://d.hatena.ne.jp/rubikitch/ Site: http://www.rubyist.net/~rubikitch/ From aaron.patterson at gmail.com Sat Apr 25 15:28:06 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Sat, 25 Apr 2009 12:28:06 -0700 Subject: [Nokogiri-talk] Encoding bug In-Reply-To: <20090426.035553.93256266.rubikitch@ruby-lang.org> References: <20090426.035553.93256266.rubikitch@ruby-lang.org> Message-ID: <6959e1680904251228g3cc81d6ye4c0cd5d61357227@mail.gmail.com> 2009/4/25 : > Hi, > > There is an encoding bug in Ruby 1.9. > > # -*- coding: euc-jp -*- > require 'rubygems' > require 'nokogiri' > Nokogiri::VERSION # => "1.2.3" > euc = "?????" > nokogiri = Nokogiri(euc, nil, "EUC-JP") > x = nokogiri.at(:b).inner_text # => "\xE3\x81\x82\xE3\x81\x84\xE3\x81\x86\xE3\x81\x88\xE3\x81\x8A" > x.encoding # => # > > require 'kconv' > Kconv.guess(x) # => # > x.force_encoding("UTF-8").encode("EUC-JP") # => "?????" What version of libxml2 are you using? I'm using 2.7.3, and here is my output: # -*- coding: euc-jp -*- require 'rubygems' require 'nokogiri' puts Nokogiri::VERSION # => "1.2.3" puts Nokogiri::LIBXML_VERSION # => '2.7.3' euc = "?????" nokogiri = Nokogiri(euc, nil, "EUC-JP") puts x = nokogiri.at(:b).inner_text # => "?????" p x.encoding # => # require 'kconv' p Kconv.guess(x) # => # p x.force_encoding("UTF-8").encode("EUC-JP") # => "??????????" p x.encode("UTF-8") # => "?????" http://skitch.com/aaronp/bpbmq/terminal-bash-80x24 -- Aaron Patterson http://tenderlovemaking.com/ From rubikitch at ruby-lang.org Sun Apr 26 13:45:30 2009 From: rubikitch at ruby-lang.org (rubikitch at ruby-lang.org) Date: Mon, 27 Apr 2009 02:45:30 +0900 (JST) Subject: [Nokogiri-talk] Encoding bug In-Reply-To: <6959e1680904251228g3cc81d6ye4c0cd5d61357227@mail.gmail.com> References: <20090426.035553.93256266.rubikitch@ruby-lang.org> <6959e1680904251228g3cc81d6ye4c0cd5d61357227@mail.gmail.com> Message-ID: <20090427.024530.175560659.rubikitch@ruby-lang.org> From: Aaron Patterson Subject: Re: [Nokogiri-talk] Encoding bug Date: Sat, 25 Apr 2009 12:28:06 -0700 > What version of libxml2 are you using? I'm using 2.7.3, and here is my output: Me too. But the result is same. > p x.encoding # => # > > require 'kconv' > p Kconv.guess(x) # => # It means encoding inconsistency. -- rubikitch Blog: http://d.hatena.ne.jp/rubikitch/ Site: http://www.rubyist.net/~rubikitch/ From aaron.patterson at gmail.com Sun Apr 26 14:33:34 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Sun, 26 Apr 2009 11:33:34 -0700 Subject: [Nokogiri-talk] Encoding bug In-Reply-To: <20090427.024530.175560659.rubikitch@ruby-lang.org> References: <20090426.035553.93256266.rubikitch@ruby-lang.org> <6959e1680904251228g3cc81d6ye4c0cd5d61357227@mail.gmail.com> <20090427.024530.175560659.rubikitch@ruby-lang.org> Message-ID: <6959e1680904261133x4b3f964erac86fcd3f91e8ab9@mail.gmail.com> On Sun, Apr 26, 2009 at 10:45 AM, wrote: > From: Aaron Patterson > Subject: Re: [Nokogiri-talk] Encoding bug > Date: Sat, 25 Apr 2009 12:28:06 -0700 > >> What version of libxml2 are you using? ?I'm using 2.7.3, and here is my output: > > Me too. But the result is same. > >> p x.encoding ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # => # >> >> require 'kconv' >> p Kconv.guess(x) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?# => # > > It means encoding inconsistency. Can you put your test file in gist? I may not have set up the file correctly. -- Aaron Patterson http://tenderlovemaking.com/ From rubikitch at ruby-lang.org Sun Apr 26 15:23:41 2009 From: rubikitch at ruby-lang.org (rubikitch at ruby-lang.org) Date: Mon, 27 Apr 2009 04:23:41 +0900 (JST) Subject: [Nokogiri-talk] Encoding bug In-Reply-To: <6959e1680904261133x4b3f964erac86fcd3f91e8ab9@mail.gmail.com> References: <6959e1680904251228g3cc81d6ye4c0cd5d61357227@mail.gmail.com> <20090427.024530.175560659.rubikitch@ruby-lang.org> <6959e1680904261133x4b3f964erac86fcd3f91e8ab9@mail.gmail.com> Message-ID: <20090427.042341.120037429.rubikitch@ruby-lang.org> From: Aaron Patterson Subject: Re: [Nokogiri-talk] Encoding bug Date: Sun, 26 Apr 2009 11:33:34 -0700 >>> p x.encoding ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # => # >>> >>> require 'kconv' >>> p Kconv.guess(x) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?# => # >> >> It means encoding inconsistency. > > Can you put your test file in gist? I may not have set up the file correctly. http://gist.github.com/102140 -- rubikitch Blog: http://d.hatena.ne.jp/rubikitch/ Site: http://www.rubyist.net/~rubikitch/ From aaron.patterson at gmail.com Sun Apr 26 17:35:13 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Sun, 26 Apr 2009 14:35:13 -0700 Subject: [Nokogiri-talk] paginating xml doc fragments -- help?! In-Reply-To: References: Message-ID: <6959e1680904261435yaab73d4l3282e298b6f77e9e@mail.gmail.com> On Sun, Apr 26, 2009 at 2:07 PM, Matt Mitchell wrote: > Hi, > > Tenderlove gave me some ideas on how I might do this, but I still couldn't > figure it out. Basically, I have an xml doc that is broken up by "pb" tags. > Each "pb" tag is a page-break, and any content after, but before the next > (if any) pb tag is considered a "page". I've got an example source doc and > the results that I want. Anyone out there wanna take a stab at this? I > haven't been successful (obviously), so even a tip or hint as to how I'd go > about solving this would be just completely sweet. > > Here is my example: http://pastie.org/458993 What went wrong with the SAX style parser we were talking about? Do you actually need an XML document returned? -- Aaron Patterson http://tenderlovemaking.com/ From aaron.patterson at gmail.com Sun Apr 26 17:45:50 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Sun, 26 Apr 2009 14:45:50 -0700 Subject: [Nokogiri-talk] Encoding bug In-Reply-To: <20090427.042341.120037429.rubikitch@ruby-lang.org> References: <6959e1680904251228g3cc81d6ye4c0cd5d61357227@mail.gmail.com> <20090427.024530.175560659.rubikitch@ruby-lang.org> <6959e1680904261133x4b3f964erac86fcd3f91e8ab9@mail.gmail.com> <20090427.042341.120037429.rubikitch@ruby-lang.org> Message-ID: <6959e1680904261445u38625729ya6301cb8af4caf75@mail.gmail.com> On Sun, Apr 26, 2009 at 12:23 PM, wrote: > From: Aaron Patterson > Subject: Re: [Nokogiri-talk] Encoding bug > Date: Sun, 26 Apr 2009 11:33:34 -0700 > >>>> p x.encoding ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # => # >>>> >>>> require 'kconv' >>>> p Kconv.guess(x) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?# => # >>> >>> It means encoding inconsistency. >> >> Can you put your test file in gist? ?I may not have set up the file correctly. > > http://gist.github.com/102140 Are you sure the file is written with EUC-JP and not UTF-8? [aaron at Jordan gist-102140 (master)]$ git status # On branch master nothing to commit (working directory clean) [aaron at Jordan gist-102140 (master)]$ file 27-023913.rb 27-023913.rb: UTF-8 Unicode text [aaron at Jordan gist-102140 (master)]$ ~/.multiruby/install/1.9.1-p0/bin/ruby 27-023913.rb 27-023913.rb:7: invalid multibyte char (EUC-JP) 27-023913.rb:7: invalid multibyte char (EUC-JP) [aaron at Jordan gist-102140 (master)]$ http://skitch.com/aaronp/bprm5/terminal-bash-80x24 -- Aaron Patterson http://tenderlovemaking.com/