From greg at intelligentassistance.com Sun Mar 1 14:08:07 2009 From: greg at intelligentassistance.com (Gregory Clarke) Date: Sun, 1 Mar 2009 11:08:07 -0800 Subject: [Nokogiri-talk] would you use this feature? inner_html= In-Reply-To: References: Message-ID: Hi Andrew, I implemented this as an alias: module Nokogiri module XML class Node alias inner_html= content= end end end I needed this function because I was moving a project from Hpricot to Nokogiri. I guess it depends on whether Nokogiri wants to be a "drop in" replacement for Hpricot (in that no Hpricot code would need to be rewritten). Greg > I made a ticket for this feature request: I would like Nodes to have > an inner_html= function. > > http://nokogiri.lighthouseapp.com/projects/19607/tickets/46-feature-request-inner_html-method-on-node > > Aaron would like to know if anyone aside from me would use this > feature, so would you? And for my own edification, what is the > existing way to set the inner html of an element? > > > Thanks, > Andrew > > > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.patterson at gmail.com Sun Mar 1 14:30:43 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Sun, 1 Mar 2009 11:30:43 -0800 Subject: [Nokogiri-talk] would you use this feature? inner_html= In-Reply-To: References: Message-ID: <6959e1680903011130y47761c77wa79b95768915e1af@mail.gmail.com> On Sun, Mar 1, 2009 at 11:08 AM, Gregory Clarke wrote: > Hi Andrew, > I?implemented?this as an alias: > module Nokogiri > module XML > class Node > alias inner_html= content= > end > end > end > I needed this function because I was moving a project from Hpricot to > Nokogiri. I guess it depends on whether Nokogiri wants to be a "drop in" > replacement for Hpricot (in that no Hpricot code would need to be > rewritten). Cool. I will add this. Not to be compatible with hpricot, but because people want to use it. Hpricot has no tests for this method, nokogiri passes all hpricot tests. I am not interested in being compatible with an API that has no spec. I hope that makes sense. -- Aaron Patterson http://tenderlovemaking.com/ From andrew at nextmobileweb.com Sun Mar 1 15:25:42 2009 From: andrew at nextmobileweb.com (Andrew Farmer) Date: Sun, 1 Mar 2009 12:25:42 -0800 Subject: [Nokogiri-talk] would you use this feature? inner_html= In-Reply-To: <6959e1680903011130y47761c77wa79b95768915e1af@mail.gmail.com> References: <6959e1680903011130y47761c77wa79b95768915e1af@mail.gmail.com> Message-ID: On Sun, Mar 1, 2009 at 11:30 AM, Aaron Patterson wrote: > On Sun, Mar 1, 2009 at 11:08 AM, Gregory Clarke > wrote: > > Hi Andrew, > > I implemented this as an alias: > > module Nokogiri > > module XML > > class Node > > alias inner_html= content= > > end > > end > > end > > I needed this function because I was moving a project from Hpricot to > > Nokogiri. I guess it depends on whether Nokogiri wants to be a "drop in" > > replacement for Hpricot (in that no Hpricot code would need to be > > rewritten). > > Cool. I will add this. Not to be compatible with hpricot, but > because people want to use it. > > Hpricot has no tests for this method, nokogiri passes all hpricot > tests. I am not interested in being compatible with an API that has > no spec. I hope that makes sense. > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > Aaron, That totally makes sense. It's good to have a better idea of what your goals are and aren't for this project. Gregory, content= will encode the string you give it, so you can't use it to set the HTML inside of a Node. Here's an example of what content= does and what I would expect inner_html= to do. doc = Nokogiri.HTML "
one
" div = doc.at("div") div.content= "
two
" div.to_html OUTPUT => "
<div>two</div>
" EXPECTED => "
two
" -------------- next part -------------- An HTML attachment was scrubbed... URL: From phlip2005 at gmail.com Wed Mar 4 23:39:28 2009 From: phlip2005 at gmail.com (Phlip) Date: Wed, 4 Mar 2009 20:39:28 -0800 Subject: [Nokogiri-talk] XPath subtitutions Message-ID: <860c114f0903042039g54e6cdb1pe6fd91a835d7711e@mail.gmail.com> Nokogirists: I could really use this notation, from libxml-ruby: node.find('/path/item[ @id = $id and @value = $value ]', { :id => 42, :value => 'frob' }) The reason is I need to provide a DSL that can absorb _any_ values, including ones with mixed ' and '' ticks. The substitution system solves that problem out-of-band. How close is Nokogiri to this feature? -- Phlip http://www.oreilly.com/catalog/9780596510657/ ^ assert_xpath http://tinyurl.com/yrc77g <-- assert_latest Model From aaron.patterson at gmail.com Thu Mar 5 00:53:46 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Wed, 4 Mar 2009 21:53:46 -0800 Subject: [Nokogiri-talk] XPath subtitutions In-Reply-To: <860c114f0903042039g54e6cdb1pe6fd91a835d7711e@mail.gmail.com> References: <860c114f0903042039g54e6cdb1pe6fd91a835d7711e@mail.gmail.com> Message-ID: <6959e1680903042153q21609a18leca28e768a2d46c4@mail.gmail.com> On Wed, Mar 4, 2009 at 8:39 PM, Phlip wrote: > Nokogirists: > > I could really use this notation, from libxml-ruby: > > ?node.find('/path/item[ @id = $id and @value = $value ]', > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?{ :id => 42, :value => 'frob' }) weird. I don't see this syntax in the libxml-ruby docs or tests. > The reason is I need to provide a DSL that can absorb _any_ values, > including ones with mixed ' and '' ticks. The substitution system > solves that problem out-of-band. > > How close is Nokogiri to this feature? Maybe I can convince you to use a different syntax. ;-) I would suggest you use custom xpath (or css) functions. It basically lets you call out to arbitrary ruby functions. Take a look at this example: http://gist.github.com/74217 I believe this is better than string substitution because it is less error prone, and gives you more power. You could let your users search via regular expressions, not just numbers and words. Also, no string substitution mistakes Hope that helps. -- Aaron Patterson http://tenderlovemaking.com/ From phlip2005 at gmail.com Thu Mar 5 08:33:37 2009 From: phlip2005 at gmail.com (Phlip) Date: Thu, 05 Mar 2009 05:33:37 -0800 Subject: [Nokogiri-talk] XPath subtitutions In-Reply-To: <6959e1680903042153q21609a18leca28e768a2d46c4@mail.gmail.com> References: <860c114f0903042039g54e6cdb1pe6fd91a835d7711e@mail.gmail.com> <6959e1680903042153q21609a18leca28e768a2d46c4@mail.gmail.com> Message-ID: <49AFD4B1.1040208@gmail.com> Aaron Patterson wrote: >> I could really use this notation, from libxml-ruby: >> >> node.find('/path/item[ @id = $id and @value = $value ]', >> { :id => 42, :value => 'frob' }) > > weird. I don't see this syntax in the libxml-ruby docs or tests. Sorry: It was REXML. I ran out of time to write yesterday... >> How close is Nokogiri to this feature? > > Maybe I can convince you to use a different syntax. ;-) Of course we can change the sytax, but... How to advertise we can search for _any_ value? I need to turn this... xpath :input, :value => anything into //input[ @value = anything ] I can't go telling a user-programmer, "oh, sorry, we can't search for 'can't', because it has an apostrophe, and we use that for our string delimiter in the XPath. I will look at a custom matcher, such as equals(@value, pop_from_a_hash(:anything)) That seems to me to belong on the Nokogiri side... -- Phlip From phlip2005 at gmail.com Thu Mar 5 09:38:18 2009 From: phlip2005 at gmail.com (Phlip) Date: Thu, 05 Mar 2009 06:38:18 -0800 Subject: [Nokogiri-talk] XPath subtitutions In-Reply-To: <6959e1680903042153q21609a18leca28e768a2d46c4@mail.gmail.com> References: <860c114f0903042039g54e6cdb1pe6fd91a835d7711e@mail.gmail.com> <6959e1680903042153q21609a18leca28e768a2d46c4@mail.gmail.com> Message-ID: <49AFE3DA.20504@gmail.com> Aaron Patterson wrote: > Hope that helps. K I will try to start with this: @xdoc = Nokogiri::XML ' whatevs zp.com ' @xdoc.xpath('//a[_search(.)]', Class.new { def initialize hash = {} @hash = hash end def _evalerate(node) @hash.each do |k, v| q = node[k.to_s] return false unless case v when Symbol ; q == v.to_s when String ; q == v when Regexp ; q =~ v else ; false # TODO raise bad arg error! end end return true end def _search nodes nodes.find_all{|node| _evalerate(node) } end }.new(:href => /zeroplayer/, :id => 'one')) That looks robust enough for my needs; tx! -- Phlip From julien.genestoux at gmail.com Thu Mar 5 19:22:34 2009 From: julien.genestoux at gmail.com (Julien Genestoux) Date: Thu, 5 Mar 2009 16:22:34 -0800 Subject: [Nokogiri-talk] We need a better XML Builder! Message-ID: <26c0cf900903051622r724fd245xa954605ff5e77117@mail.gmail.com> Hey, I don't knwo about you but I am having a lot of trouble with Nokogiri's Builder; It's not that it's not working, but I just find the syntax counter intuitive : why would we need to build the whole XML in a block? It makes it difficult to pass variables (through @context). So, why not re-using Builder's syntax with xml = Nokogiri::Builder.new xml.tag do xml.subtag(:attr => "myvalue") end ? -- Julien Genestoux http://www.ouvre-boite.com http://blog.notifixio.us +1 (415) 254 7340 +33 (0)9 70 44 76 29 -------------- next part -------------- An HTML attachment was scrubbed... URL: From phlip2005 at gmail.com Thu Mar 5 21:03:18 2009 From: phlip2005 at gmail.com (Phlip) Date: Thu, 05 Mar 2009 18:03:18 -0800 Subject: [Nokogiri-talk] We need a better XML Builder! In-Reply-To: <26c0cf900903051622r724fd245xa954605ff5e77117@mail.gmail.com> References: <26c0cf900903051622r724fd245xa954605ff5e77117@mail.gmail.com> Message-ID: <49B08466.1080003@gmail.com> Julien Genestoux wrote: > I don't knwo about you but I am having a lot of trouble with Nokogiri's > Builder; It's not that it's not working, but I just find the syntax > counter intuitive : why would we need to build the whole XML in a block? > It makes it difficult to pass variables (through @context). +1. Also, compare "why can't to_xml read my mind?": http://groups.google.com/group/rubyonrails-talk/browse_thread/thread/85a8ddcf8054d7a3 It turns out to_xml can come close, using :only, :methods, and the mighty :proc. So consider this strategy: - a deep Ruby model (Hash, ActiveRecord, whatever) - a DSL (hashes of arrays of symbols) declaring a tree of paths thru that model - a (lite) transformation layer declaring how to convert and mark-up that tree of paths I suggest this because AR's .from_xml() totally _cannot_ read my mind, and it stands to reason that passing the a DSL into .to_xml() should create XML that .from_xml() could deserialize, given the same DSL. -- Phlip http://www.zeroplayer.com/ From aaron.patterson at gmail.com Thu Mar 5 22:21:41 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Thu, 5 Mar 2009 19:21:41 -0800 Subject: [Nokogiri-talk] We need a better XML Builder! In-Reply-To: <26c0cf900903051622r724fd245xa954605ff5e77117@mail.gmail.com> References: <26c0cf900903051622r724fd245xa954605ff5e77117@mail.gmail.com> Message-ID: <6959e1680903051921q2b87ad3bw158c45a7779c3a30@mail.gmail.com> On Thu, Mar 5, 2009 at 4:22 PM, Julien Genestoux wrote: > Hey, > > I don't knwo about you but I am having a lot of trouble with Nokogiri's > Builder; It's not that it's not working, but I just find the syntax counter > intuitive : why would we need to build the whole XML in a block? It makes it > difficult to pass variables (through @context). > > So, why not re-using Builder's syntax with > xml = Nokogiri::Builder.new > xml.tag do > ??? xml.subtag(:attr => "myvalue") > end Thanks for the feedback! I'd rather not do the syntax you just spelled out because it means that I would have to keep track of depths and block calls. What do you think of something like this? string = "hello world" builder = Nokogiri::XML::Builder.new builder.root { |root| root.cdata string } It's very easy to implement, only 3 line change. I *think* it gets you what you need. Let me know! -- Aaron Patterson http://tenderlovemaking.com/ From aaron.patterson at gmail.com Thu Mar 5 22:23:38 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Thu, 5 Mar 2009 19:23:38 -0800 Subject: [Nokogiri-talk] We need a better XML Builder! In-Reply-To: <49B08466.1080003@gmail.com> References: <26c0cf900903051622r724fd245xa954605ff5e77117@mail.gmail.com> <49B08466.1080003@gmail.com> Message-ID: <6959e1680903051923p7ab079ebp3ce0d6c749c87c0a@mail.gmail.com> On Thu, Mar 5, 2009 at 6:03 PM, Phlip wrote: > Julien Genestoux wrote: > >> I don't knwo about you but I am having a lot of trouble with Nokogiri's >> Builder; It's not that it's not working, but I just find the syntax counter >> intuitive : why would we need to build the whole XML in a block? It makes it >> difficult to pass variables (through @context). > > +1. > > Also, compare "why can't to_xml read my mind?": > > http://groups.google.com/group/rubyonrails-talk/browse_thread/thread/85a8ddcf8054d7a3 > > It turns out to_xml can come close, using :only, :methods, and the mighty > :proc. > > So consider this strategy: > > ?- a deep Ruby model (Hash, ActiveRecord, whatever) > > ?- a DSL (hashes of arrays of symbols) declaring > ? ? a tree of paths thru that model > > ?- a (lite) transformation layer declaring how to > ? ? convert and mark-up that tree of paths > > I suggest this because AR's .from_xml() totally _cannot_ read my mind, and > it stands to reason that passing the a DSL into .to_xml() should create XML > that .from_xml() could deserialize, given the same DSL. I'm so confused..... -- Aaron Patterson http://tenderlovemaking.com/ From phlip2005 at gmail.com Fri Mar 6 08:59:17 2009 From: phlip2005 at gmail.com (Phlip) Date: Fri, 06 Mar 2009 05:59:17 -0800 Subject: [Nokogiri-talk] We need a better XML Builder! In-Reply-To: <6959e1680903051923p7ab079ebp3ce0d6c749c87c0a@mail.gmail.com> References: <26c0cf900903051622r724fd245xa954605ff5e77117@mail.gmail.com> <49B08466.1080003@gmail.com> <6959e1680903051923p7ab079ebp3ce0d6c749c87c0a@mail.gmail.com> Message-ID: <49B12C35.1090807@gmail.com> > I'm so confused..... I will figure out a way to write a reference implementation! From julien.genestoux at gmail.com Fri Mar 6 10:27:54 2009 From: julien.genestoux at gmail.com (Julien Genestoux) Date: Fri, 6 Mar 2009 07:27:54 -0800 Subject: [Nokogiri-talk] We need a better XML Builder! In-Reply-To: <6959e1680903051921q2b87ad3bw158c45a7779c3a30@mail.gmail.com> References: <26c0cf900903051622r724fd245xa954605ff5e77117@mail.gmail.com> <6959e1680903051921q2b87ad3bw158c45a7779c3a30@mail.gmail.com> Message-ID: <26c0cf900903060727k380af398l8030e5c8498ef0e5@mail.gmail.com> Hum, Yes, that's looks better! I'm lokking forward to yoru implementation. Thanks, -- Julien Genestoux http://www.ouvre-boite.com http://blog.notifixio.us +1 (415) 254 7340 +33 (0)9 70 44 76 29 On Thu, Mar 5, 2009 at 7:21 PM, Aaron Patterson wrote: > On Thu, Mar 5, 2009 at 4:22 PM, Julien Genestoux > wrote: > > Hey, > > > > I don't knwo about you but I am having a lot of trouble with Nokogiri's > > Builder; It's not that it's not working, but I just find the syntax > counter > > intuitive : why would we need to build the whole XML in a block? It makes > it > > difficult to pass variables (through @context). > > > > So, why not re-using Builder's syntax with > > xml = Nokogiri::Builder.new > > xml.tag do > > xml.subtag(:attr => "myvalue") > > end > > Thanks for the feedback! I'd rather not do the syntax you just > spelled out because it means that I would have to keep track of depths > and block calls. What do you think of something like this? > > string = "hello world" > builder = Nokogiri::XML::Builder.new > builder.root { |root| > root.cdata string > } > > It's very easy to implement, only 3 line change. I *think* it gets > you what you need. Let me know! > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From phlip2005 at gmail.com Fri Mar 6 10:53:13 2009 From: phlip2005 at gmail.com (Phlip) Date: Fri, 6 Mar 2009 07:53:13 -0800 Subject: [Nokogiri-talk] XML::Reader cannot read my mind! Message-ID: <860c114f0903060753u339e8b5bu3b7edafdd96adf51@mail.gmail.com> Nokogirists: My experience reading XML revolves around XPath queries. I always guessed that readers would operate like this: some_xml = '' reader = Read(some_xml) reader.on :foo do p 'I found a foo!' end reader.on :bar do p 'I found a foo!' end reader.on :baz do p 'I found a foo!' end Do any readers work like that? Or should I write it up, try it at work, and see if it's non-heinous? -- Phlip From aaron.patterson at gmail.com Fri Mar 6 12:11:08 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Fri, 6 Mar 2009 09:11:08 -0800 Subject: [Nokogiri-talk] XML::Reader cannot read my mind! In-Reply-To: <860c114f0903060753u339e8b5bu3b7edafdd96adf51@mail.gmail.com> References: <860c114f0903060753u339e8b5bu3b7edafdd96adf51@mail.gmail.com> Message-ID: <6959e1680903060911m686078a2m8c94eb4182d0b818@mail.gmail.com> On Fri, Mar 6, 2009 at 7:53 AM, Phlip wrote: > Nokogirists: > > My experience reading XML revolves around XPath queries. I always > guessed that readers would operate like this: > > ?some_xml = '' > > ?reader = Read(some_xml) > > ?reader.on :foo do > ? ?p 'I found a foo!' > ?end > > ?reader.on :bar do > ? ?p 'I found a foo!' > ?end > > ?reader.on :baz do > ? ?p 'I found a foo!' > ?end require 'nokogiri' reader = Nokogiri::XML::Reader(<<-eoxml) eoxml reader.each do |node| case node.name when 'foo' ... when 'bar' ... when 'baz' ... end end > Do any readers work like that? Or should I write it up, try it at > work, and see if it's non-heinous? Sure! -- Aaron Patterson http://tenderlovemaking.com/ From aaron.patterson at gmail.com Fri Mar 6 16:31:19 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Fri, 6 Mar 2009 13:31:19 -0800 Subject: [Nokogiri-talk] We need a better XML Builder! In-Reply-To: <26c0cf900903060727k380af398l8030e5c8498ef0e5@mail.gmail.com> References: <26c0cf900903051622r724fd245xa954605ff5e77117@mail.gmail.com> <6959e1680903051921q2b87ad3bw158c45a7779c3a30@mail.gmail.com> <26c0cf900903060727k380af398l8030e5c8498ef0e5@mail.gmail.com> Message-ID: <6959e1680903061331q55f35704la8ce5fe04f1c8080@mail.gmail.com> On Fri, Mar 6, 2009 at 7:27 AM, Julien Genestoux wrote: > Hum, > > Yes, that's looks better! > I'm lokking forward to yoru implementation. > Thanks, Committed. It will be in 1.2.2 http://github.com/tenderlove/nokogiri/commit/b976b1b7376fd755f0843b77885486224a189b3a -- Aaron Patterson http://tenderlovemaking.com/ From phlip2005 at gmail.com Sat Mar 7 02:42:40 2009 From: phlip2005 at gmail.com (Phlip) Date: Fri, 06 Mar 2009 23:42:40 -0800 Subject: [Nokogiri-talk] A light, flexible system to reconstitute to_xml's output into a new object model Message-ID: <49B22570.7020403@gmail.com> require 'test_helper' require 'nokogiri' class PostTest < ActiveSupport::TestCase =begin Rails has a powerful to_xml system, and it does not seem to have a matching from_xml. Ideally, such a system would take the same arguments as to_xml - or different arguments, if the XML must reconstitute into a different object model! Until then, here's a simple alternative with Nokogiri. (Using the "Merb Mind Maps" data fixtures), we can easily use Post.all.to_xml(:include => [:tags, :authors]) to create an XML representation of a database full of Posts, Tags, and Authors: press my playback to make it last 319949270 C30, C60, C90, Go 830026288 Bow Wow Wow 470928291 new wave 24035091 british 564391927 surf corknut! 877507363 Oryx and Crake 355119293 Margaret Atwood 115956663 book ... That requires this rather elaborate (yet extremely DRY) declaration: =end def posts_to_xml posts = Post.all(:order => 'title') posts.to_xml( :skip_types => true, :dasherize => false, :include => [:author, :tags], :except => [:author_id, :post_id, :tag_id] ) end =begin The fun hits the fan when we try to reconstitute that into an object model. We need some XML records to create new database records, and some to only update existing ones. Note that posts can share tags and authors, so we must expect to re-evaluate them, without accidentally creating duplicates. We do it by nested blocks that attempt to match the nesting structure in our XML and our object model. =end def reconstitute(xml) doc = Nokogiri::XML(xml) doc.on 'posts/post', :id, :title, :body do |node, id, *data| post = Post.find_or_initialize_by_id(id) post.update_attributes node.data node.on 'tags/tag', :id, :name do |n, id, name| tag = Tag.find_or_initialize_by_id(id) tag.name = name tag.save! post.tags << tag end node.on 'author', :id, :name do |n, id, name| author = Author.find_or_initialize_by_id(id) author.update_attributes n.data post.author = author # or just update_attribute post.save! end end end =begin We do it by wrapping .xpath() in .on(), a method that accepts: - the relative path to our record - the tag names (as symbols) of the data fields we need It yields the current node, those data fields' text values, and a hash of those values into its block. The block can then use those items to reassemble its node. These blocks use 'id's for to detect duplicate records, but they could just as easily have used a unique signature, such as a post title. =end class ::Nokogiri::XML::Node def on(tag_name, *needs, &block) xpath(tag_name).each do |node| gots = [] node.data = needs.inject({}) do |h,n| h[n] = gots << node.xpath(n.to_s).text h end block.call(node, *gots) end # note ruby > 1.8.7 can drop the gots[] system # and just use @data.values, end # (against the objections of one William James!;) attr_accessor :data end =begin This test trivially puts it all together. It converts our database to XML, erases our database, calls reconstitute(xml), asserts that our records came back, spot-checks their links, and then bulk-asserts that the entire database is now exactly the same (as far as to_xml can tell!) =end def test_reconstitute xml = posts_to_xml models = [Author, Post, Tag] models.each &:destroy_all authors, posts, tags = assert_latest *models do reconstitute(xml) end assert{ posts.map(&:title).include? 'Lithium' } assert{ posts.map(&:author).map(&:name).include? 'Bob Marley' } assert{ posts.map(&:tags).flatten.map(&:name).include? 'surf' } assert{ xml == posts_to_xml } # the ultimate test! end end From phlip2005 at gmail.com Sat Mar 7 10:11:35 2009 From: phlip2005 at gmail.com (Phlip) Date: Sat, 07 Mar 2009 07:11:35 -0800 Subject: [Nokogiri-talk] A light, flexible system to reconstitute to_xml's output into a new object model In-Reply-To: <49B22570.7020403@gmail.com> References: <49B22570.7020403@gmail.com> Message-ID: <49B28EA7.2080004@gmail.com> require 'test_helper' require 'nokogiri' class PostTest < ActiveSupport::TestCase =begin (This upgrades a little from last night's post, then refactors more...) Rails has a powerful to_xml system, and it does not seem to have a matching from_xml. Ideally, such a system would take the same arguments as to_xml - or different arguments, if the XML must reconstitute into a different object model! Until then, here's a simple alternative with Nokogiri. Using the "Merb Mind Maps" data fixtures, we can easily use Post.all.to_xml(:include => [:tags, :authors]) to create an XML representation of a database full of Posts, Tags, and Authors: press my playback to make it last 319949270 C30, C60, C90, Go 830026288 Bow Wow Wow 470928291 new wave 24035091 british 564391927 surf corknut! 877507363 Oryx and Crake 355119293 Margaret Atwood 115956663 book ... That requires this rather elaborate (yet extremely DRY) declaration: =end def posts_to_xml posts = Post.all(:order => 'title') posts.to_xml( :skip_types => true, :dasherize => false, :include => [:author, :tags], :except => [:author_id, :post_id, :tag_id] ) end =begin The fun hits the fan when we try to reconstitute that into an object model. We need some XML records to create new database records, and some to only update existing ones. Note that posts can share tags and authors, so we must expect to re-evaluate them, without accidentally creating duplicates. We do it by nested blocks that attempt to match the nesting structure in our XML and our object model. =end def reconstitute(xml) doc = Nokogiri::XML(xml) doc.convert 'posts/post', :id, :title, :body do |node, id, *data| post = Post.find_or_initialize_by_id(id) post.update_attributes node.data node.convert 'tags/tag', :id, :name do |n, id, name| tag = Tag.find_or_initialize_by_id(id) tag.update_attribute :name, name post.tags << tag end node.convert 'author', :id, :name do |n, id, name| author = Author.find_or_initialize_by_id(id) author.update_attributes n.data post.update_attribute :author, author end end end =begin We do it by wrapping .xpath() in .convert(), a method that accepts: - the relative path to our record - the tag names (as symbols) of the data fields we need It yields the current node, those data fields' text values, and a hash of those values into its block. The block can then use those items to reassemble its node. These blocks use 'id's for to detect duplicate records, but they could just as easily have used a unique signature, such as a post title. =end class ::Nokogiri::XML::Node def convert(tag_name, *needs, &block) xpath(tag_name).map do |node| gots = [] node.data = needs.inject({}) do |h,n| gots << h[n] = node.xpath(n.to_s).text h end block.call(node, *gots) end end attr_accessor :data end =begin This test trivially puts it all together. It converts our database to XML, erases our database, calls reconstitute(xml), asserts that our records came back, spot-checks their links, and then bulk-asserts that the entire database is the same (as far as to_xml can tell!) =end def test_reconstitute xml = posts_to_xml models = [Author, Post, Tag] models.map &:destroy_all authors, posts, tags = assert_latest *models do reconstitute(xml) end assert{ posts.map(&:title).include? 'Lithium' } assert{ posts.map(&:author).map(&:name).include? 'Bob Marley' } assert{ posts.map(&:tags).flatten.map(&:name).include? 'surf' } assert{ xml == posts_to_xml } # the ultimate test! end =begin Now we will clone reconstitute, and its test, forming reconstitute_too, and test_reconstitute_too. We will leave the first, simpler, reconstitute online while we refactor more of reconstitute_too into our emerging DSL. Leaving both systems online while upgrading Node#convert will force our DSL to cover both simple and advanced inputs. =end def test_reconstitute_too xml = posts_to_xml models = [Author, Post, Tag] models.map &:destroy_all authors, posts, tags = assert_latest *models do reconstitute_too(xml) end assert{ posts.map(&:title).include? 'Lithium' } assert{ posts.map(&:author).map(&:name).include? 'Bob Marley' } assert{ posts.map(&:tags).flatten.map(&:name).include? 'surf' } assert{ xml == posts_to_xml } # the ultimate test! end def reconstitute_too(xml) doc = Nokogiri::XML(xml) doc.convert Post, :id, :title, :body do |post, node, *| post.tags = node.convert(Tag, :id, :name) post.author = *node.convert(Author, :id, :name) post.save! end end class ::Nokogiri::XML::Node def convert(tag_name, *needs, &block) if tag_name.is_a? Class singular = tag_name.name.downcase plural = singular.pluralize + '/' + singular add_record = lambda do |n, *data| record = tag_name.send "find_or_initialize_by_#{needs.first}", data.first record.update_attributes n.data block.call(record, n, *data) if block record end return [ convert(singular, *needs, &add_record), convert(plural, *needs, &add_record) ].flatten.compact else xpath(tag_name).map do |node| gots = [] node.data = needs.inject({}) do |h,n| value = node.xpath(n.to_s).text gots << value h[n] = value h end block.call(node, *gots) if block end end # note ruby1.9 can drop the gots[] system # and just use node.data.values, end # (against the objections of one William James!;) attr_accessor :data end =begin And that's the DSL. The top .convert{} call looks like the data in your XML channel, and its declaration is very DRY: doc.convert Post, :id, :title, :body do |post, node, *| post.tags = node.convert(Tag, :id, :name) post.author = *node.convert(Author, :id, :name) post.save! end You could easily add new variables to your data channel by adding new symbols to that block. You can add more nested records by calling convert in its old mode, and in convert(Model) mode you can also yield the new record, the node from whence it came, and its :symbol values like this: node.convert(Tag, :id, :name) do |tag, node, id, name| # more processing here end Now some questions. Couldn't we pass the association itself in? node.convert(post.tags, :id, :name) I tried, but find_or_initialize_by_ reacted poorly to seeing the post_id preset... Otherwise, could we reasonably squeeze that DSL down to this? doc.from_xml Post, :id, :title, :body, :tags => [:id, :name], :author => [:id, :name, lambda{}] =end end From nolan at thewordnerd.info Sat Mar 7 15:39:03 2009 From: nolan at thewordnerd.info (Nolan Darilek) Date: Sat, 07 Mar 2009 14:39:03 -0600 Subject: [Nokogiri-talk] Xpath and namespaces Message-ID: <49B2DB67.9040802@thewordnerd.info> Hi, all. This is more of an xpath than a nokogiri question, and I wouldn't be asking it here if google had given me anything that worked. :( Hoping someone here can either give me the nokogiri way to do this, or can suggest modifications to the framework I'm using make it possible. I'm trying to use Babylon (http://github.com/julien51/babylon), an XMPP component framework. Babbylon creates a router that dispatches stanzas matched via xpath to specific controllers and actions. The examples shown in the documentation do very simple stanza mappings: # my_route_1: # priority: 1 # controller: main # xpath: "//message" # action: echo The problem, though, is that if you start doing anything complicated with XMPP then you have to match against namespaces, because is very different from disco#items or jabber:iq:register. I need to map the following XML, for instance: DEBUG BABYLON: RECEIVED Specifically, I need to match the fact that //iq/query's namespace is the URI http://jabber.org/protocol/disco#info. Extra perfectionist points for matching the fact that the query element is empty, but I don't think that is specifically necessary. :) First thing I tried was matching the xmlns attribute, but a bit of research showed that namespaces weren't attributes. I seem to be able to match //iq/query just fine, but can't get the namespace at all. I also tried: //iq[@type='get']/*[namespace-uri() = 'http://jabber.org/protocol/disco#info'] but this doesn't do it either. So is there any way to accomplish this using single contained xpath expressions, no access to the underlying library to alias namespaces or anything fancy like that? If not, then I guess it's time to reconsider xpath-based routing. Thanks. From phlip2005 at gmail.com Sat Mar 7 20:47:43 2009 From: phlip2005 at gmail.com (Phlip) Date: Sat, 07 Mar 2009 17:47:43 -0800 Subject: [Nokogiri-talk] [ANN] ActiveRecord .from_xml upgrade Message-ID: <49B323BF.1050303@gmail.com> Rubies: The gist of this tiny code snip... http://gist.github.com/75525 ...is a light but flexible DSL that converts XML - typically output by to_xml() - into an ActiveRecord object model. ==create or update records== Here's the simplest example: xml =' 323285 323310 ... ' doc = Nokogiri::XML(xml) photos = doc.from_xml(Photo, :id) (Note that from_xml{} is a member of a Node, not of your Model.) That code created new Photo records with matching IDs. If any record were already there, the code would update it instead. ==rename fields and pass in data== Here's the next more complicated example: authors = node.from_xml(Author, [:id, :remote_id], :name) The code reads an tag, then finds or creates an author with a matching author.remote_id. Then the code updates the author.name, and return an array of authors. ==associations== from_xml takes an optional &block, and yields into this the record under construction, before its .save! call. Use this block to seek nested data, and plug them into their parent record: doc.from_xml Post, :id, :title, :body do |post, node, *| post.tags = node.from_xml(Tag, :id, :name) post.author = *node.from_xml(Author, :id, :name) post.save! end from_xml{} will call that block each time it finds a (top-level) record, and each nested node.from_xml{} call will only find records inside that main record. (Note, also, that records, for example, should be shared between many records, and your XML will probably just duplicate them many times, but from_xml(Tag) knows to fold them all back together again...) The splat operator * threw away three more arguments - they were the string values of the id, title, and body fields. ==raw XML== To scan your XML with very similar abilities, but without using a Model with the correct name to match your XPath, write the XPath directly into the lower-level convert{} method: node.convert 'tags/tag', :id, :name do |n, id, name| tag = Tag.find_or_initialize_by_id(id) tag.update_attribute :name, name # or tag.attributes = n.data post.tags << tag end That block shows form_tag{} "unrolled" into its low-level behavior. convert{} takes an XPath query, relative to the current node, and a list of fields (and their renamers) to extract. Then it yields the detected node (don't call it "node"!) into its |goal posts|, with the string value of each detected field. Your block could have done something more complex, but this one merely simulated form_tag{} by reconstituting and updating a Tag record, then inserted it into some outer post object. One more detail - the renamed fields, and their string values, are also available as a hash. To avoid even more extra arguments into our goal posts, the committee stashed them into the passed node, as an attribute called "node.data". So the little comment shows how to update all your Model attributes at once. ==what about to_xml?== One ActiveRecord FAQ goes, "Why can't from_xml take the same arguments as to_xml?" The reason is creation is harder than just reading an existing object model. While a future version of from_xml{} could indeed learn to follow model associations, and could take a big blob of nested hashes, like most other ActiveRecord methods, the committee does not foresee this DSL exactly matching the input to to_xml(). That is a goal for further research on both sides! -- Phlip http://www.zeroplayer.com/ From aaron.patterson at gmail.com Sat Mar 7 21:04:15 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Sat, 7 Mar 2009 18:04:15 -0800 Subject: [Nokogiri-talk] Xpath and namespaces In-Reply-To: <49B2DB67.9040802@thewordnerd.info> References: <49B2DB67.9040802@thewordnerd.info> Message-ID: <6959e1680903071804o548301abifc61ef2f9716dbf9@mail.gmail.com> On Sat, Mar 7, 2009 at 12:39 PM, Nolan Darilek wrote: > Hi, all. This is more of an xpath than a nokogiri question, and I wouldn't > be asking it here if google had given me anything that worked. :( Hoping > someone here can either give me the nokogiri way to do this, or can suggest > modifications to the framework I'm using make it possible. > > I'm trying to use Babylon (http://github.com/julien51/babylon), an XMPP > component framework. Babbylon creates a router that dispatches stanzas > matched via xpath to specific controllers and actions. > > The examples shown in the documentation do very simple stanza mappings: > > # my_route_1: > # ? ? priority: 1 > # ? ? controller: main > # ? ? xpath: "//message" > # ? ? action: echo > > The problem, though, is that if you start doing anything complicated with > XMPP then you have to match against namespaces, because xmlns="http://jabber.org/protocol/disco#info"/> is very different from > disco#items or jabber:iq:register. I need to map the following XML, for > instance: > > DEBUG BABYLON: RECEIVED to='deluxmpp.thewordnerd.info' xml:lang='en' type='get' id='43'> xmlns='http://jabber.org/protocol/disco#info'/> > > Specifically, I need to match the fact that //iq/query's namespace is the > URI http://jabber.org/protocol/disco#info. Extra perfectionist points for > matching the fact that the query element is empty, but I don't think that is > specifically necessary. :) > > First thing I tried was matching the xmlns attribute, but a bit of research > showed that namespaces weren't attributes. I seem to be able to match > //iq/query just fine, but can't get the namespace at all. I also tried: > > //iq[@type='get']/*[namespace-uri() = > 'http://jabber.org/protocol/disco#info'] > > but this doesn't ?do it either. > > So is there any way to accomplish this using single contained xpath > expressions, no access to the underlying library to alias namespaces or > anything fancy like that? If not, then I guess it's time to reconsider > xpath-based routing. You must register the URL with your xpath engine before you can use it in searches. So, with nokogiri I would: doc = Nokogiri::XML(<<-eoxml) eoxml # Find any query node doc.xpath('//iq/disco:query', 'disco' => 'http://jabber.org/protocol/disco#info' ).each do |node| p node end -- Aaron Patterson http://tenderlovemaking.com/ From nolan at thewordnerd.info Sat Mar 7 22:03:52 2009 From: nolan at thewordnerd.info (Nolan Darilek) Date: Sat, 07 Mar 2009 21:03:52 -0600 Subject: [Nokogiri-talk] Xpath and namespaces References: <49B2DB67.9040802@thewordnerd.info> <6959e1680903071804o548301abifc61ef2f9716dbf9@mail.gmail.com> Message-ID: <49B33598.9090403@thewordnerd.info> On 03/07/2009 08:04 PM, Aaron Patterson wrote: > You must register the URL with your xpath engine before you can use it > in searches. So, with nokogiri I would: > > Aha! Thanks for helping me figure out the right direction. The XMPP spec and extensions are huge, with lots of namespaces. It would be nice to not have to ship a hard-coded list, or to require developers to touch several places just to match a stanza. So I've modified your code as follows: doc = Nokogiri::XML(<<-eoxml) eoxml # Find any query node doc.xpath('//iq/xmlns:query', doc.collect_namespaces).each do |node| p node end The advantage is that all namespaces are registered automatically and should be available for mapping. I'm not sure this gets me anything, though, because it doesn't let me match the specific URI. So my next question is, if I've registered a namespace with the parser, can I check the URI against a value given in the xpath expression? Or does the URI have to be registered beforehand? I guess what I'm getting at is that the URI equality check seems to be defined and run in the context of the xpath method. Can I do something like: //iq/xmlns:query[namespace-uri() = 'http://jabber.org/protocol/disco#info'] once I've registered the namespace? Thanks. From aaron.patterson at gmail.com Sat Mar 7 22:14:14 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Sat, 7 Mar 2009 19:14:14 -0800 Subject: [Nokogiri-talk] Xpath and namespaces In-Reply-To: <49B33598.9090403@thewordnerd.info> References: <49B2DB67.9040802@thewordnerd.info> <6959e1680903071804o548301abifc61ef2f9716dbf9@mail.gmail.com> <49B33598.9090403@thewordnerd.info> Message-ID: <6959e1680903071914h247893a2qb09004a247583b98@mail.gmail.com> On Sat, Mar 7, 2009 at 7:03 PM, Nolan Darilek wrote: > On 03/07/2009 08:04 PM, Aaron Patterson wrote: >> >> You must register the URL with your xpath engine before you can use it >> in searches. ?So, with nokogiri I would: >> >> > > Aha! Thanks for helping me figure out the right direction. > > The XMPP spec and extensions are huge, with lots of namespaces. It would be > nice to not have to ship a hard-coded list, or to require developers to > touch several places just to match a stanza. So I've modified your code as > follows: > > doc = Nokogiri::XML(<<-eoxml) > xml:lang='en' type='get' id='43'> xmlns='http://jabber.org/protocol/disco#info'/> > eoxml > > # Find any query node > doc.xpath('//iq/xmlns:query', doc.collect_namespaces).each do |node| > ?p node > end > > The advantage is that all namespaces are registered automatically and should > be available for mapping. I'm not sure this gets me anything, though, > because it doesn't let me match the specific URI. > > So my next question is, if I've registered a namespace with the parser, can > I check the URI against a value given in the xpath expression? Or does the > URI have to be registered beforehand? I guess what I'm getting at is that > the URI equality check seems to be defined and run in the context of the > xpath method. Can I do something like: > > //iq/xmlns:query[namespace-uri() = 'http://jabber.org/protocol/disco#info'] > > once I've registered the namespace? By registering the namespace and using namespace matches in your xpath, the node *must* belong to the namespace (and URL) which you have specified. Namespaces are *always* urls, and by registering them, you are just telling the xpath engine what alias to use when matching. Here is an example which may help: http://gist.github.com/75553 By registering the namespace and using the namespace alias in your xpath, you guarantee that the url in the namespace matches the url which you have registered. hth. -- Aaron Patterson http://tenderlovemaking.com/ From nolan at thewordnerd.info Sun Mar 8 00:33:56 2009 From: nolan at thewordnerd.info (Nolan Darilek) Date: Sat, 07 Mar 2009 23:33:56 -0600 Subject: [Nokogiri-talk] Xpath and namespaces References: <49B2DB67.9040802@thewordnerd.info> <6959e1680903071804o548301abifc61ef2f9716dbf9@mail.gmail.com> <49B33598.9090403@thewordnerd.info> <6959e1680903071914h247893a2qb09004a247583b98@mail.gmail.com> Message-ID: <49B358C4.9010200@thewordnerd.info> On 03/07/2009 09:14 PM, Aaron Patterson wrote: >> > By registering the namespace and using namespace matches in your > xpath, the node *must* belong to the namespace (and URL) which you > have specified. Namespaces are *always* urls, and by registering > them, you are just telling the xpath engine what alias to use when > matching. > > Cool, thanks for all your help and patience. I've found a solution that works. Now I'm just trying to make it look nicer. It may be, though, that something this unholy just can't be made to look nice. :P I discovered custom functions and wrote one that, when given a node set, finds all nodes with the specified name and namespace URI. It's probably incredibly naive, but I think all the XMPP namespaces use the default namespace, so it should do the job. Here's the code now: doc = Nokogiri::XML(<<-eoxml) eoxml class CustomXpath def namespace(set, name, ns) set.find_all.each do |n| n.name == name && n.namespaces.values.include?(ns) end end end doc.xpath("//iq/*[namespace(., 'query', 'http://jabber.org/protocol/disco#info')]", CustomXpath.new).each do |node| p node end doc.xpath("//iq/*[namespace(., 'query', 'http://jabber.org/protocol/disco#inf')]", CustomXpath.new).each do |node| p node # Nothing end doc.xpath("//iq/*[namespace(., 'qry', 'http://jabber.org/protocol/disco#info')]", CustomXpath.new).each do |node| p node # Nothing end The first nodeset argument seems superfluous, though. It would be nice if I could eliminate the first rgument, or the *. Is there any way to do that? I thought perhaps I could pass something into the CustomXpath constructor that would let the method access the current context, but I don't immediately see a way to do that. I'll keep looking, though. :) Thanks again. Even if I can't eliminate the argument, this looks like a better solution than having to collect a bunch of namespaces and try to memorize how they map from specifications to symbols or instance variables. :) From aaron.patterson at gmail.com Sun Mar 8 01:35:30 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Sat, 7 Mar 2009 22:35:30 -0800 Subject: [Nokogiri-talk] Xpath and namespaces In-Reply-To: <49B358C4.9010200@thewordnerd.info> References: <49B2DB67.9040802@thewordnerd.info> <6959e1680903071804o548301abifc61ef2f9716dbf9@mail.gmail.com> <49B33598.9090403@thewordnerd.info> <6959e1680903071914h247893a2qb09004a247583b98@mail.gmail.com> <49B358C4.9010200@thewordnerd.info> Message-ID: <6959e1680903072235g2946495bo176605de1ef4058a@mail.gmail.com> On Sat, Mar 7, 2009 at 9:33 PM, Nolan Darilek wrote: > On 03/07/2009 09:14 PM, Aaron Patterson wrote: >>> >>> >> >> By registering the namespace and using namespace matches in your >> xpath, the node *must* belong to the namespace (and URL) which you >> have specified. ?Namespaces are *always* urls, and by registering >> them, you are just telling the xpath engine what alias to use when >> matching. >> >> > > Cool, thanks for all your help and patience. I've found a solution that > works. Now I'm just trying to make it look nicer. It may be, though, that > something this unholy just can't be made to look nice. :P > > I discovered custom functions and wrote one that, when given a node set, > finds all nodes with the specified name and namespace URI. It's probably > incredibly naive, but I think all the XMPP namespaces use the default > namespace, so it should do the job. Here's the code now: > > doc = Nokogiri::XML(<<-eoxml) > xml:lang='en' type='get' id='43'> xmlns='http://jabber.org/protocol/disco#info'/> > eoxml > > class CustomXpath > ?def namespace(set, name, ns) > ? ?set.find_all.each do |n| > ? ? ?n.name == name && n.namespaces.values.include?(ns) > ? ?end > ?end > end > > doc.xpath("//iq/*[namespace(., 'query', > 'http://jabber.org/protocol/disco#info')]", CustomXpath.new).each do |node| > ?p node > end > > doc.xpath("//iq/*[namespace(., 'query', > 'http://jabber.org/protocol/disco#inf')]", CustomXpath.new).each do |node| > ?p node # Nothing > end > > doc.xpath("//iq/*[namespace(., 'qry', > 'http://jabber.org/protocol/disco#info')]", CustomXpath.new).each do |node| > ?p node # Nothing > end > > The first nodeset argument seems superfluous, though. It would be nice if I > could eliminate the first rgument, or the *. ?Is there any way to do that? I > thought perhaps I could pass something into the CustomXpath constructor that > would let the method access the current context, but I don't immediately see > a way to do that. I'll keep looking, though. :) > > Thanks again. Even if I can't eliminate the argument, this looks like a > better solution than having to collect a bunch of namespaces and try to > memorize how they map from specifications to symbols or instance variables. I'm not quite clear on your requirements. Is it really necessary to have the URL somewhere in your xpath? This code yields exactly the same results, is shorter, and seems much more clear to me: doc = Nokogiri::XML(<<-eoxml) eoxml ns_set = { 'info' => 'http://jabber.org/protocol/disco#info', 'inf' => 'http://jabber.org/protocol/disco#inf', # Doesn't exist } doc.xpath("//iq/info:query", ns_set).each do |node| p node end doc.xpath("//iq/inf:query", ns_set).each do |node| p node # Nothing end doc.xpath("//iq/info:qry", ns_set).each do |node| p node # Nothing end Anyway, I'm glad you found what you need regardless! :-) -- Aaron Patterson http://tenderlovemaking.com/ From nolan at thewordnerd.info Sun Mar 8 12:52:37 2009 From: nolan at thewordnerd.info (Nolan Darilek) Date: Sun, 08 Mar 2009 11:52:37 -0500 Subject: [Nokogiri-talk] Xpath and namespaces References: <49B2DB67.9040802@thewordnerd.info> <6959e1680903071804o548301abifc61ef2f9716dbf9@mail.gmail.com> <49B33598.9090403@thewordnerd.info> <6959e1680903071914h247893a2qb09004a247583b98@mail.gmail.com> <49B358C4.9010200@thewordnerd.info> <6959e1680903072235g2946495bo176605de1ef4058a@mail.gmail.com> Message-ID: <49B3F7D5.5010705@thewordnerd.info> On 03/08/2009 12:35 AM, Aaron Patterson wrote: > I'm not quite clear on your requirements. Is it really necessary to > have the URL somewhere in your xpath? This code yields exactly the > same results, is shorter, and seems much more clear to me: > > You're thinking in terms of code. My xpath expressions are in a yaml document. Having to add the namespaces to the code would require either defining the xpath in ruby, or creating a separate mapping of namespaces to strings. So, to add a route, I'd have to touch two separate areas in framework code--one to define the xpath, another to define the alias for the namespace. Just seems much easier to put the namespace URI right in the xpath. So, given that requirement, is there any way to remove the node_set argument from the namespace function, so that *[namespace("query", ...)] works? From aaron.patterson at gmail.com Sun Mar 8 14:22:36 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Sun, 8 Mar 2009 10:22:36 -0800 Subject: [Nokogiri-talk] Xpath and namespaces In-Reply-To: <49B3F7D5.5010705@thewordnerd.info> References: <49B2DB67.9040802@thewordnerd.info> <6959e1680903071804o548301abifc61ef2f9716dbf9@mail.gmail.com> <49B33598.9090403@thewordnerd.info> <6959e1680903071914h247893a2qb09004a247583b98@mail.gmail.com> <49B358C4.9010200@thewordnerd.info> <6959e1680903072235g2946495bo176605de1ef4058a@mail.gmail.com> <49B3F7D5.5010705@thewordnerd.info> Message-ID: <6959e1680903081122x19f0a088wa332a588a3d9bc57@mail.gmail.com> On Sun, Mar 8, 2009 at 8:52 AM, Nolan Darilek wrote: > On 03/08/2009 12:35 AM, Aaron Patterson wrote: >> >> I'm not quite clear on your requirements. Is it really necessary to >> have the URL somewhere in your xpath? ?This code yields exactly the >> same results, is shorter, and seems much more clear to me: >> >> > > You're thinking in terms of code. My xpath expressions are in a yaml > document. Having to add the namespaces to the code would require either > defining the xpath in ruby, or creating a separate mapping of namespaces to > strings. So, to add a route, I'd have to touch two separate areas in > framework code--one to define the xpath, another to define the alias for the > namespace. Just seems much easier to put the namespace URI right in the > xpath. Ah. Gotcha. > So, given that requirement, is there any way to remove the node_set argument > from the namespace function, so that *[namespace("query", ...)] works? Not really. The function needs to return a list of nodes that match your criteria. How will you return a list of search results without a list to search through? -- Aaron Patterson http://tenderlovemaking.com/ From nolan at thewordnerd.info Sun Mar 8 15:10:46 2009 From: nolan at thewordnerd.info (Nolan Darilek) Date: Sun, 08 Mar 2009 14:10:46 -0500 Subject: [Nokogiri-talk] Xpath and namespaces References: <49B2DB67.9040802@thewordnerd.info> <6959e1680903071804o548301abifc61ef2f9716dbf9@mail.gmail.com> <49B33598.9090403@thewordnerd.info> <6959e1680903071914h247893a2qb09004a247583b98@mail.gmail.com> <49B358C4.9010200@thewordnerd.info> <6959e1680903072235g2946495bo176605de1ef4058a@mail.gmail.com> <49B3F7D5.5010705@thewordnerd.info> <6959e1680903081122x19f0a088wa332a588a3d9bc57@mail.gmail.com> Message-ID: <49B41836.1010700@thewordnerd.info> On 03/08/2009 01:22 PM, Aaron Patterson wrote: > Not really. The function needs to return a list of nodes that match > your criteria. How will you return a list of search results without a > list to search through? > > By passing a reference to a parser pointer of some kind to the custom handler class when it is instantiated. By having access to some sort of context, the custom class could base all of its searches from the current location in the document, the list of nodes matched by the latest expression, etc. Doesn't look like this is currently supported, though. Is it possible? If so, should I file a ticket somewhere? From julien.genestoux at gmail.com Sun Mar 8 22:25:26 2009 From: julien.genestoux at gmail.com (Julien Genestoux) Date: Sun, 8 Mar 2009 19:25:26 -0700 Subject: [Nokogiri-talk] Equivalent to skip_instruct? Message-ID: <26c0cf900903081925p4cab1206yf666107b685fda2b@mail.gmail.com> Hey, When converting Nokgorii docs to xml (string), I al trying to skip the "instuct" part of the generation. Do you think is is possible? Thanks, -- Julien Genestoux http://www.ouvre-boite.com http://blog.notifixio.us +1 (415) 254 7340 +33 (0)9 70 44 76 29 -------------- next part -------------- An HTML attachment was scrubbed... URL: From phlip2005 at gmail.com Mon Mar 9 00:04:23 2009 From: phlip2005 at gmail.com (Phlip) Date: Sun, 08 Mar 2009 21:04:23 -0700 Subject: [Nokogiri-talk] to_xhtml( :indent => 2 ) ? Message-ID: <49B49547.9030304@gmail.com> Nokogiri: I need pretty-printed HTML, and I can't find 'indent' in N's test folder. What's the shortest path to this feature? -- Phlip From julien.genestoux at gmail.com Tue Mar 10 20:34:17 2009 From: julien.genestoux at gmail.com (Julien Genestoux) Date: Tue, 10 Mar 2009 17:34:17 -0700 Subject: [Nokogiri-talk] Problems with the SAX Parser Message-ID: <26c0cf900903101734l106f75e4p5609553f944f3a31@mail.gmail.com> Hey Nokogiri-crowd, I've been doing some testing for Babylon (a ruby gem to create XMPP applications easily) and I've bumped into something weird. I am not sure it is a Nokogiri/LibXML2 bug, or something that we did wrong in our implementation, but basically, some xml elements are "ignored" from the pushed stream. It is pretty hard to explain, and as I don't want you to spend time on setting up Babylon, I've isolated the failing code into this gist : http://gist.github.com/77215 Basically, I am simulating a XMPP stream with an eventmachine and timers. I am sending what a regular stream contains and, as you'll see, the last XML items are not parsed correctly. I would really appreciate some help to find out what's wrong here (again, it can just be Babylon's implementation which is not correct...) I have tried to split between "><" and add a newline. It works fine then, but I have no controller over that so there has to be another solution. Thanks, Julien -- Julien Genestoux http://www.ouvre-boite.com http://blog.notifixio.us +1 (415) 254 7340 +33 (0)9 70 44 76 29 -------------- next part -------------- An HTML attachment was scrubbed... URL: From julien.genestoux at gmail.com Tue Mar 10 21:11:19 2009 From: julien.genestoux at gmail.com (Julien Genestoux) Date: Tue, 10 Mar 2009 18:11:19 -0700 Subject: [Nokogiri-talk] Problems with the SAX Parser In-Reply-To: <26c0cf900903101734l106f75e4p5609553f944f3a31@mail.gmail.com> References: <26c0cf900903101734l106f75e4p5609553f944f3a31@mail.gmail.com> Message-ID: <26c0cf900903101811n672a7d31k570c159d2855211b@mail.gmail.com> As expected, it was our implementation that failed ;) We actually need to use the Nokogiri::XML::SAX::PushParser and not the regular parser. Sorry for the waste of internet resources ;) Julien -- Julien Genestoux http://www.ouvre-boite.com http://blog.notifixio.us +1 (415) 254 7340 +33 (0)9 70 44 76 29 On Tue, Mar 10, 2009 at 5:34 PM, Julien Genestoux < julien.genestoux at gmail.com> wrote: > Hey Nokogiri-crowd, > > I've been doing some testing for Babylon (a ruby gem to create XMPP > applications easily) and I've bumped into something weird. > I am not sure it is a Nokogiri/LibXML2 bug, or something that we did wrong > in our implementation, but basically, some xml elements are "ignored" from > the pushed stream. > > It is pretty hard to explain, and as I don't want you to spend time on > setting up Babylon, I've isolated the failing code into this gist : > http://gist.github.com/77215 > > Basically, I am simulating a XMPP stream with an eventmachine and timers. I > am sending what a regular stream contains and, as you'll see, the last XML > items are not parsed correctly. > > I would really appreciate some help to find out what's wrong here (again, > it can just be Babylon's implementation which is not correct...) > > I have tried to split between "><" and add a newline. It works fine then, > but I have no controller over that so there has to be another solution. > > Thanks, > > Julien > > > > > > -- > Julien Genestoux > http://www.ouvre-boite.com > http://blog.notifixio.us > > +1 (415) 254 7340 > +33 (0)9 70 44 76 29 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From phlip2005 at gmail.com Tue Mar 10 23:52:20 2009 From: phlip2005 at gmail.com (Phlip) Date: Tue, 10 Mar 2009 20:52:20 -0700 Subject: [Nokogiri-talk] assume the position Message-ID: <49B73574.9090009@gmail.com> Nokogiri: Given a Nokogiri::XML::Node, how do I detect its relative position among its parent's children? Should I just look for it among its parent's children? And how do I detect its absolute position within the DOM? Should I start at the root and do depth-first in-order traversal, counting, until I find it? I will do those things, yet it seems that I'm replicating XPath's position() function, so if there's a simpler way... I'm also open to alternate suggestions for my ulterior algorithm here. I'm trying to upgrade http://gist.github.com/76136 so it complains if your HTML puts nodes in a different order than your spec... -- Phlip From aaron.patterson at gmail.com Wed Mar 11 00:13:30 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Tue, 10 Mar 2009 21:13:30 -0700 Subject: [Nokogiri-talk] assume the position In-Reply-To: <49B73574.9090009@gmail.com> References: <49B73574.9090009@gmail.com> Message-ID: <6959e1680903102113n13c5b578u2ee845f50fb9891b@mail.gmail.com> On Tue, Mar 10, 2009 at 8:52 PM, Phlip wrote: > Nokogiri: > > Given a Nokogiri::XML::Node, how do I detect its relative position among its > parent's children? Should I just look for it among its parent's children? Not too easily right now. Just find it among its parents children. > And how do I detect its absolute position within the DOM? Should I start at > the root and do depth-first in-order traversal, counting, until I find it? Nokogiri::XML::Node#path will return the complete xpath for the node. > I will do those things, yet it seems that I'm replicating XPath's position() > function, so if there's a simpler way... > > I'm also open to alternate suggestions for my ulterior algorithm here. I'm > trying to upgrade http://gist.github.com/76136 so it complains if your HTML > puts nodes in a different order than your spec... > > -- > ?Phlip > > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > -- Aaron Patterson http://tenderlovemaking.com/ From phlip2005 at gmail.com Wed Mar 11 00:33:06 2009 From: phlip2005 at gmail.com (Phlip) Date: Tue, 10 Mar 2009 21:33:06 -0700 Subject: [Nokogiri-talk] assume the position In-Reply-To: <6959e1680903102113n13c5b578u2ee845f50fb9891b@mail.gmail.com> References: <49B73574.9090009@gmail.com> <6959e1680903102113n13c5b578u2ee845f50fb9891b@mail.gmail.com> Message-ID: <49B73F02.3030400@gmail.com> Aaron Patterson wrote: >> Given a Nokogiri::XML::Node, how do I detect its relative position among its >> parent's children? Should I just look for it among its parent's children? > > Not too easily right now. Just find it among its parents children. > >> And how do I detect its absolute position within the DOM? Should I start at >> the root and do depth-first in-order traversal, counting, until I find it? > > Nokogiri::XML::Node#path will return the complete xpath for the node. Is "/html/body/form" higher or lower in document order than "/html/body/ul"? You can't tell, can you. Yet document order is indeed a first-class XML concept. Is there an (agreeably demented) system to emit the path like this? "/*[ 1]/*[ 1]" "/*[ 1]/*[ 1]/*[ 1]" "/*[ 1]/*[ 1]/*[ 1]/*[ 2]" "/*[ 1]/*[ 1]/*[ 1]/*[ 3]" Then you can simply sort the paths to get back their document order. It's just a thought. The outer topic is: Given a testable HTML page, build this builder script: form :action => '/users' do fieldset do legend 'Personal Information' label 'First name' input :type => 'text', :name => 'user[first_name]' end end Now locate in that HTML the tree that most closely matches that tree. Skip intermediate tags (such as
  • , for example), and skip attributes which the builder-ed HTML does not contain. If you can find such a tree, then that script passes, as an assertion. But node position should matter. The current code would pass even if the actual HTML put the requested