From noreply at rubyforge.org Sun May 4 10:54:43 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Sun, 4 May 2008 10:54:43 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-19935 ] Impossible specify encoding for HTMLParser Message-ID: <20080504145444.36ED31858696@rubyforge.org> Bugs item #19935, was opened at 2008-05-04 10:54 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=19935&group_id=494 Category: General Group: v0.5 Status: Open Resolution: None Priority: 3 Submitted By: Nobody (None) Assigned to: Nobody (None) Summary: Impossible specify encoding for HTMLParser Initial Comment: HTMLParser don't have any methods to set encoding for document when automatic detection work wrong. ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=19935&group_id=494 From psharp at teladoc.com Mon May 12 15:16:59 2008 From: psharp at teladoc.com (Patrick Sharp) Date: Mon, 12 May 2008 14:16:59 -0500 Subject: [libxml-devel] errors Message-ID: <2D79F721-2DDC-4FB3-BA11-1A6CD9A73FCE@teladoc.com> Hi, I'm looking for the ruby version of the php method: libxml_use_internal_errors Am I missing it in the docs? Thanks, Patrick Sharp Software Developer TelaDoc Medical Services 972-865-2648 -------------- next part -------------- An HTML attachment was scrubbed... URL: From noreply at rubyforge.org Tue May 13 07:48:54 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Tue, 13 May 2008 07:48:54 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-20117 ] Segfault when using XML::Reader.expand Message-ID: <20080513114854.3D8D018585C1@rubyforge.org> Bugs item #20117, was opened at 2008-05-13 13:48 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=20117&group_id=494 Category: memory Group: v0.5 Status: Open Resolution: None Priority: 3 Submitted By: Gabriel Ebner (gebner) Assigned to: Nobody (None) Summary: Segfault when using XML::Reader.expand Initial Comment: The following program segfaults under libxml-ruby 0.5.4 (gem) and ruby 1.8.6 (debian sid / amd64): require 'rubygems' require 'xml/libxml' reader = XML::Reader.new '' 2.times do reader.read end reader.expand if reader.name == 'b' (gdb) run bug.rb Starting program: /usr/bin/ruby bug.rb [Thread debugging using libthread_db enabled] [New Thread 0x7f81f824b6e0 (LWP 1298)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f81f824b6e0 (LWP 1298)] classname (klass=257) at variable.c:150 150 variable.c: No such file or directory. in variable.c (gdb) bt #0 classname (klass=257) at variable.c:150 #1 0x00007f81f7e0094d in rb_class_path (klass=257) at variable.c:190 #2 0x00007f81f7e00a99 in rb_class2name (klass=257) at variable.c:301 #3 0x00007f81f7d8a13a in rb_check_type (x=257, t=34) at error.c:276 #4 0x00007f81f69f59b8 in ruby_xml_node_deregisterNode (node=0x24f6ba0) at ruby_xml_node.c:2264 #5 0x00007f81f676fe3a in xmlTextReaderFreeDoc (reader=0x2510ee0, cur=0x24f6ba0) at xmlreader.c:482 #6 0x00007f81f677046d in xmlFreeTextReader__internal_alias (reader=0x2510ee0) at xmlreader.c:2195 #7 0x00007f81f7daac81 in rb_gc_call_finalizer_at_exit () at gc.c:1928 #8 0x00007f81f7d90293 in ruby_finalize_1 () at eval.c:1553 #9 0x00007f81f7d98e51 in ruby_cleanup (ex=0) at eval.c:1590 #10 0x00007f81f7d98f69 in ruby_stop (ex=257) at eval.c:1645 #11 0x00007f81f7da46ff in ruby_run () at eval.c:1666 #12 0x0000000000400883 in main (argc=2, argv=0x7fff002691e8, envp=) at main.c:48 (gdb) ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=20117&group_id=494 From noreply at rubyforge.org Tue May 13 08:11:40 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Tue, 13 May 2008 08:11:40 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-20117 ] Segfault when using XML::Reader.expand Message-ID: <20080513121143.4DBC418585B8@rubyforge.org> Bugs item #20117, was opened at 2008-05-13 13:48 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=20117&group_id=494 Category: memory Group: v0.5 Status: Open Resolution: None Priority: 3 Submitted By: Gabriel Ebner (gebner) Assigned to: Nobody (None) Summary: Segfault when using XML::Reader.expand Initial Comment: The following program segfaults under libxml-ruby 0.5.4 (gem) and ruby 1.8.6 (debian sid / amd64): require 'rubygems' require 'xml/libxml' reader = XML::Reader.new '' 2.times do reader.read end reader.expand if reader.name == 'b' (gdb) run bug.rb Starting program: /usr/bin/ruby bug.rb [Thread debugging using libthread_db enabled] [New Thread 0x7f81f824b6e0 (LWP 1298)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f81f824b6e0 (LWP 1298)] classname (klass=257) at variable.c:150 150 variable.c: No such file or directory. in variable.c (gdb) bt #0 classname (klass=257) at variable.c:150 #1 0x00007f81f7e0094d in rb_class_path (klass=257) at variable.c:190 #2 0x00007f81f7e00a99 in rb_class2name (klass=257) at variable.c:301 #3 0x00007f81f7d8a13a in rb_check_type (x=257, t=34) at error.c:276 #4 0x00007f81f69f59b8 in ruby_xml_node_deregisterNode (node=0x24f6ba0) at ruby_xml_node.c:2264 #5 0x00007f81f676fe3a in xmlTextReaderFreeDoc (reader=0x2510ee0, cur=0x24f6ba0) at xmlreader.c:482 #6 0x00007f81f677046d in xmlFreeTextReader__internal_alias (reader=0x2510ee0) at xmlreader.c:2195 #7 0x00007f81f7daac81 in rb_gc_call_finalizer_at_exit () at gc.c:1928 #8 0x00007f81f7d90293 in ruby_finalize_1 () at eval.c:1553 #9 0x00007f81f7d98e51 in ruby_cleanup (ex=0) at eval.c:1590 #10 0x00007f81f7d98f69 in ruby_stop (ex=257) at eval.c:1645 #11 0x00007f81f7da46ff in ruby_run () at eval.c:1666 #12 0x0000000000400883 in main (argc=2, argv=0x7fff002691e8, envp=) at main.c:48 (gdb) ---------------------------------------------------------------------- >Comment By: Gabriel Ebner (gebner) Date: 2008-05-13 14:11 Message: I think I have found the problem. The xmlTextReader usually deletes the old node when you call xmlTextReaderRead. This is why we're getting segfaults when freeing the node. You can prevent this deletion by calling xmlTextReaderPreserve, but this will still cause double frees on the document. But you can prevent this too by calling xmlTextReaderCurrentDoc, which will incidentially prevent the document from being freed automatically. ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=20117&group_id=494 From marc at bloodnok.com Mon May 26 20:17:46 2008 From: marc at bloodnok.com (Marc Munro) Date: Mon, 26 May 2008 17:17:46 -0700 Subject: [libxml-devel] Status of libxml? Message-ID: <1211847466.19645.3.camel@bloodnok.com> I'm curious about the development status of libxml. My application still core dumps fairly regularly though not in any way that has proven useful for tracking down problems. Is there any active development looking into the memory problems? What is the status of libxml and libxsl for ruby 1.9? Thanks. __ Marc -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From danj at 3skel.com Tue May 27 07:20:05 2008 From: danj at 3skel.com (Dan Janowski) Date: Tue, 27 May 2008 07:20:05 -0400 Subject: [libxml-devel] Status of libxml? In-Reply-To: <1211847466.19645.3.camel@bloodnok.com> References: <1211847466.19645.3.camel@bloodnok.com> Message-ID: <6ee5862e0805270420n45487520ha21738049a12f96d@mail.gmail.com> I have been unable to continue work, my personal life not allowing. However, the amount of effort that I have put into this has not cleared the library of all its problems and it really needs active shared involvement of more than just a single developer. This is sort of asking for manna, but the number of people using this library and the number of issues with it are too much for a single developer needing to make a living. Dan On 5/26/08, Marc Munro wrote: > > I'm curious about the development status of libxml. > > My application still core dumps fairly regularly though not in any way > that has proven useful for tracking down problems. > > Is there any active development looking into the memory problems? > What is the status of libxml and libxsl for ruby 1.9? > > Thanks. > > __ > > Marc > > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From saurabhnanda at gmail.com Tue May 27 08:28:23 2008 From: saurabhnanda at gmail.com (Saurabh Nanda) Date: Tue, 27 May 2008 17:58:23 +0530 Subject: [libxml-devel] Status of libxml? In-Reply-To: <6ee5862e0805270420n45487520ha21738049a12f96d@mail.gmail.com> References: <1211847466.19645.3.camel@bloodnok.com> <6ee5862e0805270420n45487520ha21738049a12f96d@mail.gmail.com> Message-ID: <794f042d0805270528r6303d5c8q9c7d4938db2ef561@mail.gmail.com> I shifted out of libxml to the (much) slower REXML and an ad-hoc over-the-network Java based XSD validation system. I would love to get libxml-ruby working again. A lot of my projects will depend on it. Dan, how can other people help? Is there someone who can coordinate other people's efforts in the right direction? Saurabh. -- http://nandz.blogspot.com http://foodieforlife.blogspot.com From robert.fischer at smokejumperit.com Tue May 27 10:41:03 2008 From: robert.fischer at smokejumperit.com (Robert Fischer) Date: Tue, 27 May 2008 09:41:03 -0500 Subject: [libxml-devel] Status of libxml? In-Reply-To: <794f042d0805270528r6303d5c8q9c7d4938db2ef561@mail.gmail.com> References: <1211847466.19645.3.camel@bloodnok.com> <6ee5862e0805270420n45487520ha21738049a12f96d@mail.gmail.com> <794f042d0805270528r6303d5c8q9c7d4938db2ef561@mail.gmail.com> Message-ID: <483C1D7F.5080906@smokejumperit.com> REXML is a mess, and not just because it's slower. Currently, the consensus seems to be to use Hpricot's XML mode until libxml stabilizes. ~~ Robert. Saurabh Nanda wrote: > I shifted out of libxml to the (much) slower REXML and an ad-hoc > over-the-network Java based XSD validation system. I would love to get > libxml-ruby working again. A lot of my projects will depend on it. > > Dan, how can other people help? Is there someone who can coordinate > other people's efforts in the right direction? > > Saurabh. From sean at chittenden.org Tue May 27 10:58:07 2008 From: sean at chittenden.org (Sean Chittenden) Date: Tue, 27 May 2008 07:58:07 -0700 Subject: [libxml-devel] Status of libxml? In-Reply-To: <794f042d0805270528r6303d5c8q9c7d4938db2ef561@mail.gmail.com> References: <1211847466.19645.3.camel@bloodnok.com> <6ee5862e0805270420n45487520ha21738049a12f96d@mail.gmail.com> <794f042d0805270528r6303d5c8q9c7d4938db2ef561@mail.gmail.com> Message-ID: <4A0AD3FB-6ED9-4C8F-9126-8BA1C4BFFB65@chittenden.org> > I shifted out of libxml to the (much) slower REXML and an ad-hoc > over-the-network Java based XSD validation system. I would love to get > libxml-ruby working again. A lot of my projects will depend on it. > > Dan, how can other people help? Is there someone who can coordinate > other people's efforts in the right direction? Sean's Posit #6: The group of people who need fast XML processing and the group of people know C are almost completely orthogonal technical groups. Ruby (or more specifically rails) has turned into a kind of open source refuge for .net developers who lack much clue, so the signal to noise ratio for most ruby projects is... atypical and out of line. What the project needs are people who can actually program in C, understand XML and are in a position to spend cycles on this problem. As is, there aren't many who have these two essentially non- overlapping skills and rarer still, people who have cycles to spend on the thankless job of library development. -sc -- Sean Chittenden sean at chittenden.org From danj.3skel at gmail.com Tue May 27 11:43:41 2008 From: danj.3skel at gmail.com (Dan Janowski) Date: Tue, 27 May 2008 11:43:41 -0400 Subject: [libxml-devel] Status of libxml? In-Reply-To: <4A0AD3FB-6ED9-4C8F-9126-8BA1C4BFFB65@chittenden.org> References: <1211847466.19645.3.camel@bloodnok.com> <6ee5862e0805270420n45487520ha21738049a12f96d@mail.gmail.com> <794f042d0805270528r6303d5c8q9c7d4938db2ef561@mail.gmail.com> <4A0AD3FB-6ED9-4C8F-9126-8BA1C4BFFB65@chittenden.org> Message-ID: <6ee5862e0805270843i142db524t66cafc23eda4ca30@mail.gmail.com> The intersection is a null set. The best hope may be rubinius where the extensions can be written in ruby. Fixing the memory model for nodes and documents made a big difference, but the problems that are lingering are elusive and compounded by ruby's obsficating GC. I put a lot of time into it, but the maintenance is clearly more demanding than I, or any one person, can provide. So here we are. On 5/27/08, Sean Chittenden wrote: > > I shifted out of libxml to the (much) slower REXML and an ad-hoc >> over-the-network Java based XSD validation system. I would love to get >> libxml-ruby working again. A lot of my projects will depend on it. >> >> Dan, how can other people help? Is there someone who can coordinate >> other people's efforts in the right direction? >> > > > Sean's Posit #6: The group of people who need fast XML processing and the > group of people know C are almost completely orthogonal technical groups. > > Ruby (or more specifically rails) has turned into a kind of open source > refuge for .net developers who lack much clue, so the signal to noise ratio > for most ruby projects is... atypical and out of line. > > What the project needs are people who can actually program in C, understand > XML and are in a position to spend cycles on this problem. As is, there > aren't many who have these two essentially non-overlapping skills and rarer > still, people who have cycles to spend on the thankless job of library > development. -sc > > -- > Sean Chittenden > sean at chittenden.org > > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc at bloodnok.com Wed May 28 10:55:37 2008 From: marc at bloodnok.com (Marc Munro) Date: Wed, 28 May 2008 07:55:37 -0700 Subject: [libxml-devel] A couple of questions (was Re: Status of libxml?) In-Reply-To: <6ee5862e0805270843i142db524t66cafc23eda4ca30@mail.gmail.com> References: <1211847466.19645.3.camel@bloodnok.com> <6ee5862e0805270420n45487520ha21738049a12f96d@mail.gmail.com> <794f042d0805270528r6303d5c8q9c7d4938db2ef561@mail.gmail.com> <4A0AD3FB-6ED9-4C8F-9126-8BA1C4BFFB65@chittenden.org> <6ee5862e0805270843i142db524t66cafc23eda4ca30@mail.gmail.com> Message-ID: <1211986537.7551.22.camel@bloodnok.com> Well, it's not quite a null set. I can write C, and I have some knowledge of XML. My problem is being able to spare the cycles. I am using libxml for my own free software project and I would rather spend my free cycles on that than on the supporting infrastructure. I may have to change my mind though, as I appear to be at the limits of the current stability of libxml. It looks like my choices are contribute to libxml or rewrite what I have in another language (none of the other libraries for ruby are anything like fast enough). Right now I'm thinking that my project's long term future might be with a rewrite in D. However I am prepared to spend a little time looking at the issues in libxml, and I have a question about the entire philosophy of the old reference counting memory management implementation, and the new mark-sweep based approach. 1) The reference-counting implementation allowed multiple ruby references to the same C structures: is this also true of the newer approach? 2) Is there a fundamental need for more than one ruby object to reference any libxml C object? 3) I am not that familiar with the Ruby memory management model but would the following rather different approach be feasible: when a ruby objects is created the C object is given a reference to it, and when the ruby object is garbage collected the corresponding C object loses its reference and is then conditionally freed (if none of the associated nodes have remaining references)? I am beginning to think, admittedly with little real evidence and a high likelihood of being wrong, that the fundamental stability problem arises more because of the fact that we allow the C objects to be shared by multiple ruby objects than anything else. Comments, answers? __ Marc On Tue, 2008-05-27 at 11:43 -0400, Dan Janowski wrote: > The intersection is a null set. The best hope may be rubinius where > the extensions can be written in ruby. Fixing the memory model for > nodes and documents made a big difference, but the problems that are > lingering are elusive and compounded by ruby's obsficating GC. I put a > lot of time into it, but the maintenance is clearly more demanding > than I, or any one person, can provide. So here we are. > > On 5/27/08, Sean Chittenden wrote: > I shifted out of libxml to the (much) slower REXML and > an ad-hoc > over-the-network Java based XSD validation system. I > would love to get > libxml-ruby working again. A lot of my projects will > depend on it. > > Dan, how can other people help? Is there someone who > can coordinate > other people's efforts in the right direction? > > > Sean's Posit #6: The group of people who need fast XML > processing and the group of people know C are almost > completely orthogonal technical groups. > > Ruby (or more specifically rails) has turned into a kind of > open source refuge for .net developers who lack much clue, so > the signal to noise ratio for most ruby projects is... > atypical and out of line. > > What the project needs are people who can actually program in > C, understand XML and are in a position to spend cycles on > this problem. As is, there aren't many who have these two > essentially non-overlapping skills and rarer still, people who > have cycles to spend on the thankless job of library > development. -sc > > -- > Sean Chittenden > sean at chittenden.org > > > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > > > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From danj at 3skel.com Wed May 28 11:23:21 2008 From: danj at 3skel.com (Dan Janowski) Date: Wed, 28 May 2008 11:23:21 -0400 Subject: [libxml-devel] A couple of questions (was Re: Status of libxml?) In-Reply-To: <1211986537.7551.22.camel@bloodnok.com> References: <1211847466.19645.3.camel@bloodnok.com> <6ee5862e0805270420n45487520ha21738049a12f96d@mail.gmail.com> <794f042d0805270528r6303d5c8q9c7d4938db2ef561@mail.gmail.com> <4A0AD3FB-6ED9-4C8F-9126-8BA1C4BFFB65@chittenden.org> <6ee5862e0805270843i142db524t66cafc23eda4ca30@mail.gmail.com> <1211986537.7551.22.camel@bloodnok.com> Message-ID: On May 28, 2008, at 10:55, Marc Munro wrote: > Well, it's not quite a null set. I can write C, and I have some > knowledge of XML. My problem is being able to spare the cycles. I am > using libxml for my own free software project and I would rather spend > my free cycles on that than on the supporting infrastructure. > > I may have to change my mind though, as I appear to be at the limits > of > the current stability of libxml. It looks like my choices are > contribute to libxml or rewrite what I have in another language > (none of > the other libraries for ruby are anything like fast enough). Right > now > I'm thinking that my project's long term future might be with a > rewrite > in D. > > However I am prepared to spend a little time looking at the issues in > libxml, and I have a question about the entire philosophy of the old > reference counting memory management implementation, and the new > mark-sweep based approach. > > 1) The reference-counting implementation allowed multiple ruby > references to the same C structures: is this also true of the newer > approach? > The new model does not allow this. A single C object has at most one ruby peer object. If by traversal through the xml, by xpath or otherwise, a node is returned that already has a ruby peer, the same ruby object will be returned. If there is no ruby peer, it is allocated. > 2) Is there a fundamental need for more than one ruby object to > reference any libxml C object? > No need and many serious problems with having more than one-to-one associations. > 3) I am not that familiar with the Ruby memory management model but > would the following rather different approach be feasible: when a ruby > objects is created the C object is given a reference to it, and when > the > ruby object is garbage collected the corresponding C object loses its > reference and is then conditionally freed (if none of the associated > nodes have remaining references)? > That is how the new memory model works. The ruby peer has an internal void reference to the libxml object and the libxml object member _private holds either NULL when not peered or a VALUE to the peer. Ruby object free will reset _private to NULL and if the libxml node has no parents or enclosing document it calls free for libxml as well. If cleanup is initiated from the libxml side, all nodes as they are reclaimed, via ruby_xml_node_deregisterNode, will be checked and dereferenced on the ruby peer. > I am beginning to think, admittedly with little real evidence and a > high > likelihood of being wrong, that the fundamental stability problem > arises > more because of the fact that we allow the C objects to be shared by > multiple ruby objects than anything else. > That was the single largest problem which should be resolved. The issues that remain are as yet undefined. > Comments, answers? > > __ > Marc > > On Tue, 2008-05-27 at 11:43 -0400, Dan Janowski wrote: >> The intersection is a null set. The best hope may be rubinius where >> the extensions can be written in ruby. Fixing the memory model for >> nodes and documents made a big difference, but the problems that are >> lingering are elusive and compounded by ruby's obsficating GC. I >> put a >> lot of time into it, but the maintenance is clearly more demanding >> than I, or any one person, can provide. So here we are. >> >> On 5/27/08, Sean Chittenden wrote: >> I shifted out of libxml to the (much) slower REXML and >> an ad-hoc >> over-the-network Java based XSD validation system. I >> would love to get >> libxml-ruby working again. A lot of my projects will >> depend on it. >> >> Dan, how can other people help? Is there someone who >> can coordinate >> other people's efforts in the right direction? >> >> >> Sean's Posit #6: The group of people who need fast XML >> processing and the group of people know C are almost >> completely orthogonal technical groups. >> >> Ruby (or more specifically rails) has turned into a kind of >> open source refuge for .net developers who lack much clue, so >> the signal to noise ratio for most ruby projects is... >> atypical and out of line. >> >> What the project needs are people who can actually program in >> C, understand XML and are in a position to spend cycles on >> this problem. As is, there aren't many who have these two >> essentially non-overlapping skills and rarer still, people who >> have cycles to spend on the thankless job of library >> development. -sc >> >> -- >> Sean Chittenden >> sean at chittenden.org >> >> >> _______________________________________________ >> libxml-devel mailing list >> libxml-devel at rubyforge.org >> http://rubyforge.org/mailman/listinfo/libxml-devel >> >> >> _______________________________________________ >> libxml-devel mailing list >> libxml-devel at rubyforge.org >> http://rubyforge.org/mailman/listinfo/libxml-devel > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel From sean at chittenden.org Wed May 28 21:11:29 2008 From: sean at chittenden.org (Sean Chittenden) Date: Wed, 28 May 2008 18:11:29 -0700 Subject: [libxml-devel] A couple of questions (was Re: Status of libxml?) In-Reply-To: <1211986537.7551.22.camel@bloodnok.com> References: <1211847466.19645.3.camel@bloodnok.com> <6ee5862e0805270420n45487520ha21738049a12f96d@mail.gmail.com> <794f042d0805270528r6303d5c8q9c7d4938db2ef561@mail.gmail.com> <4A0AD3FB-6ED9-4C8F-9126-8BA1C4BFFB65@chittenden.org> <6ee5862e0805270843i142db524t66cafc23eda4ca30@mail.gmail.com> <1211986537.7551.22.camel@bloodnok.com> Message-ID: <70F2DDA0-D6BD-43A6-A46C-F5B8317D2D3B@chittenden.org> > Well, it's not quite a null set. I can write C, and I have some > knowledge of XML. My problem is being able to spare the cycles. I am > using libxml for my own free software project and I would rather spend > my free cycles on that than on the supporting infrastructure. Welcome to my world circa 2004. <:~) > I may have to change my mind though, as I appear to be at the limits > of > the current stability of libxml. It looks like my choices are > contribute to libxml or rewrite what I have in another language > (none of > the other libraries for ruby are anything like fast enough). Right > now > I'm thinking that my project's long term future might be with a > rewrite > in D. D is the new hotness and I don't imagine that changing any time soon. I was bullish about Ruby in '00 - '01, Rubinius is a likely savior for Ruby, but D's got all the right things to be the next big big language if its compiler situation gets a bit more transparent (*kicks GCC's hardline stance for getting gdc incorporated as a base gcc lang*). > 1) The reference-counting implementation allowed multiple ruby > references to the same C structures: is this also true of the newer > approach? As Dan said, this is no more. It's very hard to maintain (though ideal from a performance perspective) but brutal to program. > I am beginning to think, admittedly with little real evidence and a > high > likelihood of being wrong, that the fundamental stability problem > arises > more because of the fact that we allow the C objects to be shared by > multiple ruby objects than anything else. The easiest way forward is to copy memory/nodes like it's going out of style and let the GC pick up the pieces. -sc -- Sean Chittenden sean at chittenden.org From stephen.bannasch at deanbrook.org Wed May 28 21:41:45 2008 From: stephen.bannasch at deanbrook.org (Stephen Bannasch) Date: Wed, 28 May 2008 21:41:45 -0400 Subject: [libxml-devel] simple benchmarks comparing libxml to alternative ruby xml parsing solutions In-Reply-To: <1211986537.7551.22.camel@bloodnok.com> References: <1211847466.19645.3.camel@bloodnok.com> <6ee5862e0805270420n45487520ha21738049a12f96d@mail.gmail.com> <794f042d0805270528r6303d5c8q9c7d4938db2ef561@mail.gmail.com> <4A0AD3FB-6ED9-4C8F-9126-8BA1C4BFFB65@chittenden.org> <6ee5862e0805270843i142db524t66cafc23eda4ca30@mail.gmail.com> <1211986537.7551.22.camel@bloodnok.com> Message-ID: FYI: I've been using libxml in some projects and have been getting into JRuby which gives me access to Java xml libraries from Ruby. I thought people on this list might be interested some simple benchmarking I did a couple of months ago. I'm hoping to use Hpricot for general XML processing instead of Rexml or Libxml in some projects and I wanted to find out the speeds of different XML parsers in MRI and JRuby. * I was very impressed by how much faster JRuby is when running in Java 1.6 than in 1.5. In Java 1.6 Hpricot in JRuby was only 10% slower than in MRI. So far I've only got one test parsing a 100k xml file and counting a certain type of element. I'm planning to add more tests that cover more of the kind of processing I need to do. This is the test: Do this 100 times: - parse a 100k XML file and count the 466 leaf nodes The results shown below are the times after a "rehearsal". The times for JRuby are faster when the JVM has been "warmed-up". The rehearsal has no effect on the MRI timings. Platform and method total time ----------------------------------------------------------- JRuby (Java 1.6.0) jdom_document_builder 0.363 MRI: libxml 0.389 JRuby (Java 1.6.0 server) jdom_document_builder 0.412 JRuby (server) jdom_document_builder 0.617 JRuby: jdom_document_builder 1.451 MRI: hpricot 2.056 JRuby (Java 1.6.0 server) hpricot 2.272 JRuby (Java 1.6.0) hpricot 2.273 JRuby (server) hpricot 3.447 JRuby: hpricot 6.198 JRuby (Java 1.6.0 server) rexml 6.251 JRuby (Java 1.6.0) rexml 6.356 MRI: rexml 7.624 JRuby (server) rexml 9.609 JRuby: rexml 12.944 * I'd also like to add tests for Ruby 1.9. The timings reported here are taken from the second time the 100x loop is run for each platform/library test so the JVM should be warmed up. Tested on: MacBook Pro 2.33 GHz Intel Core 2 Duo 4 GB memory running MacOS X 10.5.2 Ruby versions tested: MRI: ruby 1.8.6 (2007-09-24 patchlevel 111) [universal-darwin9.0] JRuby: ruby 1.8.6 (2008-03-20 rev 6255) [i386-jruby1.1RC3] on Java 1.5.0_13 JRuby: ruby 1.8.6 (2008-03-20 rev 6255) [i386-jruby1.1RC3] on Java 1.6.0_03 (Soylatte) Library versions MRI: libxml-ruby 0.5.4 hpricot 0.6 Library versions JRuby: hpricot 0.6.161 More details are available in the links below: Benchmark code and data checked into subversion here: https://svn.concord.org/svn/projects/trunk/common/ruby/xml_benchmarks Trac: http://trac.cosmos.concord.org/projects/browser/trunk/common/ruby/xml_benchmarks * Hpricot uses code created by Ragel, a state machine compiler that can produce C or Java code, for the initial parsing. The Ragel => Java compiler can only produce one style of code generation and it is not the fastest. The style chosen by Hpricot for generating the C code produces a larger executable and is faster.