From rosco at roscopeco.co.uk Tue Dec 13 07:07:20 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Tue, 13 Dec 2005 12:07:20 -0000 Subject: [libxml-devel] Observations on code from xml-tools Message-ID: Hi, Just wanted to say hello, and air a few thoughts, so I'll jump right in. Firstly, from Bob's message yesterday: > The project home will be the existing libxml project at rubyforge.org > (http://rubyforge.org/projects/libxml/). We need to decide whether to maintain libxsl as a separate project or put its activity under libxml (I vote the latter, but don't have strong feelings one way or the other). +1 for merging them. > We need to figure out whether Sean Crittenden's last source or [Trans'] > is the best starting point. To get a feel for the code I grabbed the source for both xml and xsl from the xml-tools CVS. Obviously this may not be applicable in the end but I just wanted to get time to play with the libraries and think about where we were at. I'm quite new to Ruby and haven't interfaced it with C yet, and I've not done any 'real' C for a few years, so there are probably mistakes here, but I noted a few things based on that code: * I had to make some small changes to compile with GCC 4 (-fno-builtin-atan and some lvalue casts which aren't supported anymore) but generally the build seemed fine and the libraries installed okay. * I had to bring a few xml files back from the 'Outdated' directory for the tests. Libxml has three failing tests (two look quite trivial though). Libxslt has 9 failing tests but I suspect there may be other factors at play since it seems to work fine (at least for the fuzface test ( :D ) and some quick tests I put together). From this it seems to me that we don't have that far to go to get both xml and xsl going as they stand, assuming we are basing on that code. In any case, I think we should maybe work to that point and get a release out before starting on the todos and so on. That way we could also try to get some feedback from outside the project too as to what people want from this software, and put together a set of use-cases and from there a 'formal' roadmap (I'm assuming this hasn't already been done?). Most of the todos seem pretty reasonable, and many of them seem like they should be achievable in fairly short order. I've also been noting down some other ideas I had (api conveniences and stuff), but I'll hold back with that until we get there :) I'm going to work this alongside Rote, and I'm happy work on whichever bits are necessary from either C or Ruby. One direct suggestion I do have is that we switch to a Rake build, to make things easier when (if?) we introduce gems and website into the equation. Anyway, just my thoughts so far... Cheers, Ross -- Ross Bamford - rosco at roscopeco.co.uk From rosco at roscopeco.co.uk Fri Dec 16 19:31:25 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Sat, 17 Dec 2005 00:31:25 -0000 Subject: [libxml-devel] Fwd: Parsing xhtml with libxml In-Reply-To: <9e4733910512161518m71324a06j3fc2687689e286e8@mail.gmail.com> References: <9e4733910512161518m71324a06j3fc2687689e286e8@mail.gmail.com> Message-ID: Hi, I've been quietly picking up patches and the like, and playing around a bit with libxml. I received the patch I forwarded here today, and have also patched updates for GCC 4.0 and so forth. With your agreement, I'd like to get libxml rolling again by working toward a 'ground zero' release (just really incorporating the contributed patches). I am aware of the need to get a workable development plan together but right now I feel it's most important to get working code out to the users, and I have the time to do it at the moment. Any objections to this? Cheers, Ross ------- Forwarded message ------- From: "Jon Smirl" To: ruby-talk at ruby-lang.org, "Ross Bamford" Cc: Subject: Parsing xhtml with libxml Date: Fri, 16 Dec 2005 23:18:42 -0000 If you get errors complaining of undefined entities like   when parsing xhtml it means you need to install the DTD for xhtml 1.0 or 1.1. Example of a doctype for xhtml 1.1: You want to install the DTDs locally following the model in /etc/xml. If you don't libxml will fetch the DTD from www.w3.org each time you parse a document. Needing to install these DTDs was not obvious to me and should be part of the documentation. There a rpm for xhtml 1.0 - "xhtml1-dtds-1.0-7". I couldn't find one for xhtml 1.1 so I downloaded it piecemeal from w3.org. Installing the DTD does not automatically turn on validation. If you want to validate you need to turn it on: XML::Parser::default_validity_checking = TRUE XML::Parser::default_load_external_dtd controls the loading of the 'external subset' (the definition for the character entities like &. It is defaulted to TRUE. XML::Parser::default_load_external_dtd is broken. This fixes it. Index: ruby_xml_parser.c ========================================================== RCS file: /var/cvs/xml-tools/libxml-ruby/ruby_xml_parser.c,v retrieving revision 1.1.1.1 diff -r1.1.1.1 ruby_xml_parser.c 274c274 < if (xmlSubstituteEntitiesDefaultValue) --- > if (xmlLoadExtDtdDefaultValue) 916c916 < ruby_xml_parser_default_load_external_dtd_set, 0); --- > ruby_xml_parser_default_load_external_dtd_get, > 0); 918c918 < ruby_xml_parser_default_load_external_dtd_get, 1); --- > ruby_xml_parser_default_load_external_dtd_set, > 1); Sam's patches for libxml are also needed: http://www.intertwingly.net/blog/2005/11/05/Patch-for-libxml2s-Ruby-binding -- This email has been verified as Virus free Virus Protection and more available at http://www.plus.net -- Ross Bamford - rosco at roscopeco.co.uk From rosco at roscopeco.co.uk Sun Dec 18 23:31:21 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Mon, 19 Dec 2005 04:31:21 -0000 Subject: [libxml-devel] Fwd: Re: Ruby and XML, validation and DOM References: <43a56a72$0$343$5fc30a8@news.tiscali.it> Message-ID: Hi, I've spent today doing a bit of an audit of the code, noting todos, and documenting pretty much everything. This was sparked off by the message I replied to below. The doc is up at the URL listed (didn't want anyone chancing across it and thinking it's the real thing until it's done). It's pretty basic and involves some (educated) guesswork right now, but it's a start. I now have a building, almost-testing (two tests excluded for the time being) codebase, I do not know if there were any further changes after Trans' work. I've incorporated a number of patches from the community, and marked a lot of extra todos in the source from things I've spotted today. If everyone is really busy or has lost interest, that's fine - I'm happy to handle much of this myself. I just need one of the admins to give me the go-ahead and let me know the current legal position (LICENSE is out of CVS now?) Cheers, Ross ---- Forwarded Usenet-message ---- From: "Ross Bamford" Newsgroups: comp.lang.ruby Subject: Re: Ruby and XML, validation and DOM Date: Mon, 19 Dec 2005 04:19:22 -0000 URL: news:// On Sun, 18 Dec 2005 15:04:46 -0000, Ross Bamford wrote: > On Sun, 18 Dec 2005 13:56:16 -0000, Gendag > wrote: > >> I need I help, i need a lib to validate a XML file and to create >> a DOM tree. I've search in the web but I didn't find anuthing, in truth >> I've found libxml for tuby but I didn't find any documentation for the >> API... >> > > Not that this is much help right now, but I am working on on some basic > documentation for the current libxml codebase. I'll put it up somewhere > as an interim measure either today or tomorrow. I'll post a link when > it's ready. > First cut is up on my site: http://roscopeco.co.uk/projects/libxml-ruby/ It's quite basic but it has the all-important 'what methods does it have' thing about it. It covers the last version released by Trans (I think?), version 0.3.4-04.04.14. Please be advised that a few bugs have been found: * Problems with the default validation on XML::Parser (DTD/Schema validation should be unaffected) * Some small memory leaks across the library * Incompatibilities with GCC 4.0 * Moving nodes between Node/NodeSet operations can cause segfault at Ruby exit. * XPointer/XPath range support is buggy. However I have the critical stuff patched up here and hopefully we'll be able to get an interim release out very soon. NOTE: The URL above is temporary, so if you're reading from archives it'll probably be a redirect to libxml on rubyforge. We (or I anyway) just aren't quite sure where that is yet ;)) -- Ross Bamford - rosco at roscopeco.co.uk From sean at gigave.com Mon Dec 19 04:02:14 2005 From: sean at gigave.com (Sean Chittenden) Date: Mon, 19 Dec 2005 01:02:14 -0800 Subject: [libxml-devel] Fwd: Re: Ruby and XML, validation and DOM In-Reply-To: References: <43a56a72$0$343$5fc30a8@news.tiscali.it> Message-ID: <20051219090214.GG2019@mailhost.gigave.com> > I now have a building, almost-testing (two tests excluded for the > time being) codebase, I do not know if there were any further > changes after Trans' work. I've incorporated a number of patches > from the community, and marked a lot of extra todos in the source > from things I've spotted today. > > If everyone is really busy or has lost interest, that's fine - I'm > happy to handle much of this myself. I just need one of the admins > to give me the go-ahead and let me know the current legal position > (LICENSE is out of CVS now?) I just categorized the project in the trove with the following: License: Artistic License, BSD License, MIT/X Consortium License I'd like to see this continue down the 2 or 3 clause BSD license or MIT license path. If that's the case, then go ahead. :) -sc -- Sean Chittenden From showaltb at adelphia.net Mon Dec 19 08:15:22 2005 From: showaltb at adelphia.net (Bob Showalter) Date: Mon, 19 Dec 2005 08:15:22 -0500 Subject: [libxml-devel] Fwd: Parsing xhtml with libxml In-Reply-To: References: <9e4733910512161518m71324a06j3fc2687689e286e8@mail.gmail.com> Message-ID: <43A6B26A.4020705@adelphia.net> Ross Bamford wrote: > Hi, > > I've been quietly picking up patches and the like, and playing around a > bit with libxml. I received the patch I forwarded here today, and have > also patched updates for GCC 4.0 and so forth. With your agreement, I'd > like to get libxml rolling again by working toward a 'ground zero' release > (just really incorporating the contributed patches). I am aware of the > need to get a workable development plan together but right now I feel it's > most important to get working code out to the users, and I have the time > to do it at the moment. > > Any objections to this? No objections here. I like the idea of the "ground zero" release. Ross, why don't you go ahead and take the lead in getting your codebase + patches into CVS. I would suggest we create a libxml module in CVS (do we need libxsl as well?). On a different note, I would like to create a www module and get a Rote project started in there so we can get a proper project home page going. I'll take that task barring any objections from the group. I have zilch design skills, but I can at least get some of the plumbing in :-) I would also like to work on the gem packaging for this project. From rosco at roscopeco.co.uk Mon Dec 19 12:03:41 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Mon, 19 Dec 2005 17:03:41 -0000 Subject: [libxml-devel] Fwd: Re: Ruby and XML, validation and DOM In-Reply-To: <20051219090215.37AAE216FCE@mailhost.gigave.com> References: <43a56a72$0$343$5fc30a8@news.tiscali.it> <20051219090215.37AAE216FCE@mailhost.gigave.com> Message-ID: On Mon, 19 Dec 2005 09:02:14 -0000, Sean Chittenden wrote: >> I now have a building, almost-testing (two tests excluded for the >> time being) codebase, I do not know if there were any further >> changes after Trans' work. I've incorporated a number of patches >> from the community, and marked a lot of extra todos in the source >> from things I've spotted today. >> >> If everyone is really busy or has lost interest, that's fine - I'm >> happy to handle much of this myself. I just need one of the admins >> to give me the go-ahead and let me know the current legal position >> (LICENSE is out of CVS now?) > > I just categorized the project in the trove with the following: > > License: Artistic License, BSD License, MIT/X Consortium License > > I'd like to see this continue down the 2 or 3 clause BSD license or > MIT license path. If that's the case, then go ahead. :) -sc > To be honest, any license is fine by me, though I tend to prefer MIT these days... :) I'll get onto it. I think we'll need to sort out a LICENSE file at some point - I've never done multiple license before, and can't remember the original copyright lines, so I don't know how that might look... (Sorry, not too up on the legal side of things) Cheers, Ross -- Ross Bamford - rosco at roscopeco.co.uk From rosco at roscopeco.co.uk Mon Dec 19 12:20:38 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Mon, 19 Dec 2005 17:20:38 -0000 Subject: [libxml-devel] Fwd: Parsing xhtml with libxml In-Reply-To: <43A6B26A.4020705@adelphia.net> References: <9e4733910512161518m71324a06j3fc2687689e286e8@mail.gmail.com> <43A6B26A.4020705@adelphia.net> Message-ID: On Mon, 19 Dec 2005 13:15:22 -0000, Bob Showalter wrote: > Ross Bamford wrote: >> Hi, >> >> I've been quietly picking up patches and the like, and playing around a >> bit with libxml. I received the patch I forwarded here today, and have >> also patched updates for GCC 4.0 and so forth. With your agreement, I'd >> like to get libxml rolling again by working toward a 'ground zero' >> release >> (just really incorporating the contributed patches). I am aware of the >> need to get a workable development plan together but right now I feel >> it's >> most important to get working code out to the users, and I have the time >> to do it at the moment. >> >> Any objections to this? > > No objections here. I like the idea of the "ground zero" release. Ross, > why don't you go ahead and take the lead in getting your codebase + > patches into CVS. I would suggest we create a libxml module in CVS (do > we need libxsl as well?). > Until now I've pretty much ignored libxslt, thinking we should concentrate on getting xml out in a stable and usable form. xsl has more failing tests, but they seem more superficial for the most part. What I thought was get xml into CVS and work on the few things there are, plus try to get a heads up from other platforms. I think it's most important that we get it compiling on OSX too. Once we get that up and out the door, we could then start on xsl, pretty much doing the same thing to get a stable release out quickly. The feeling I get from the past few days is that xml lready has a bit of a community around it, and it certainly seems more generally useful to me than xsl (but that could be just because I don't use XSL myself ;) ). Also, of course, xsl depends on xml so it makes sense to start bottom up I think. I'm going to do a final sanity check, and then import this codebase to CVS, with 'libxml' as the module name. Do we have anyone on OSX, or Windows? If not, I'd like to post a message in c.l.ruby asking someone on those platforms to try it out and get us a list of the errors they have (if any). We could also do with a GCC 3.x check (it'd help me with some warnings I'm getting as well). > On a different note, I would like to create a www module and get a Rote > project started in there so we can get a proper project home page going. > I'll take that task barring any objections from the group. I have zilch > design skills, but I can at least get some of the plumbing in :-) > > I would also like to work on the gem packaging for this project. Sounds good to me - I definitely think we need to start out strong with at least no critical bugs, a bit of doc, and a Gem-based install. My feeling with doco is that design doesn't matter too much as long as we get the information across (hence all the nearly-plaintext pages I come up with ;)) I think it makes sense to keep a separate Rote project, since we have the two libraries. Let me know your experiences with Rote :) Cheers, Ross -- Ross Bamford - rosco at roscopeco.co.uk From showaltb at adelphia.net Mon Dec 19 13:17:07 2005 From: showaltb at adelphia.net (Bob Showalter) Date: Mon, 19 Dec 2005 13:17:07 -0500 Subject: [libxml-devel] Fwd: Parsing xhtml with libxml In-Reply-To: References: <9e4733910512161518m71324a06j3fc2687689e286e8@mail.gmail.com> <43A6B26A.4020705@adelphia.net> Message-ID: <43A6F923.5020506@adelphia.net> Ross Bamford wrote: > I'm going to do a final sanity check, and then import this codebase to > CVS, with 'libxml' as the module name. Sounds good. > Do we have anyone on OSX, or > Windows? If not, I'd like to post a message in c.l.ruby asking someone on > those platforms to try it out and get us a list of the errors they have > (if any). We could also do with a GCC 3.x check (it'd help me with some > warnings I'm getting as well). I'll be happy to test on Windows and FreeBSD with gcc 3.4 toolchain. I can also test on HP-UX. I don't have OSX. >>I would also like to work on the gem packaging for this project. > > I think it makes sense to keep a separate Rote project, since we have the > two libraries. Let me know your experiences with Rote :) I'll get started. So far, I've done only a 1-page "site" with Rote at http://net-netrc.rubyforge.org, but it's extremely easy to use; I really like the idea of being able to use textile, rdoc, markdown, etc. for content. From sean at gigave.com Mon Dec 19 13:50:43 2005 From: sean at gigave.com (Sean Chittenden) Date: Mon, 19 Dec 2005 10:50:43 -0800 Subject: [libxml-devel] Fwd: Parsing xhtml with libxml In-Reply-To: <43A6B26A.4020705@adelphia.net> References: <9e4733910512161518m71324a06j3fc2687689e286e8@mail.gmail.com> <43A6B26A.4020705@adelphia.net> Message-ID: <20051219185043.GA2776@mailhost.gigave.com> > > I've been quietly picking up patches and the like, and playing > > around a bit with libxml. I received the patch I forwarded here > > today, and have also patched updates for GCC 4.0 and so > > forth. With your agreement, I'd like to get libxml rolling again > > by working toward a 'ground zero' release (just really > > incorporating the contributed patches). I am aware of the need to > > get a workable development plan together but right now I feel it's > > most important to get working code out to the users, and I have > > the time to do it at the moment. > > > > Any objections to this? > > No objections here. I like the idea of the "ground zero" > release. Ross, why don't you go ahead and take the lead in getting > your codebase + patches into CVS. I would suggest we create a libxml > module in CVS (do we need libxsl as well?). CVS or SVN? I dislike both but think svn sucks the least. Yes, we should setup a libxslt module too. > On a different note, I would like to create a www module and get a > Rote project started in there so we can get a proper project home > page going. I'll take that task barring any objections from the > group. I have zilch design skills, but I can at least get some of > the plumbing in :-) Knock yourself out. > I would also like to work on the gem packaging for this project. Very much needed. -sc -- Sean Chittenden From showaltb at adelphia.net Mon Dec 19 13:57:00 2005 From: showaltb at adelphia.net (Bob Showalter) Date: Mon, 19 Dec 2005 13:57:00 -0500 Subject: [libxml-devel] Fwd: Parsing xhtml with libxml In-Reply-To: <20051219185043.GA2776@mailhost.gigave.com> References: <9e4733910512161518m71324a06j3fc2687689e286e8@mail.gmail.com> <43A6B26A.4020705@adelphia.net> <20051219185043.GA2776@mailhost.gigave.com> Message-ID: <43A7027C.6020104@adelphia.net> Sean Chittenden wrote: > CVS or SVN? I dislike both but think svn sucks the least. Yes, we > should setup a libxslt module too. Just inertia on my part. I know CVS; I don't know SVN. I'm willing to learn if that's the consensus of the group... From sean at gigave.com Mon Dec 19 13:58:23 2005 From: sean at gigave.com (Sean Chittenden) Date: Mon, 19 Dec 2005 10:58:23 -0800 Subject: [libxml-devel] Fwd: Re: Ruby and XML, validation and DOM In-Reply-To: References: <43a56a72$0$343$5fc30a8@news.tiscali.it> <20051219090215.37AAE216FCE@mailhost.gigave.com> Message-ID: <20051219185823.GB2776@mailhost.gigave.com> > > I just categorized the project in the trove with the following: > > > > License: Artistic License, BSD License, MIT/X Consortium License > > > > I'd like to see this continue down the 2 or 3 clause BSD license > > or MIT license path. If that's the case, then go ahead. :) -sc > > To be honest, any license is fine by me, though I tend to prefer MIT > these days... :) > > I'll get onto it. I think we'll need to sort out a LICENSE file at > some point - I've never done multiple license before, and can't > remember the original copyright lines, so I don't know how that > might look... > > (Sorry, not too up on the legal side of things) I've caved and just use the MIT license too. For a while I was hunting for a viral license that says, "this software MIT licensed, can not be re-licensed or forked, and can't be used in GPL projects, but can be used in commercial products as one would expect with the MIT license" and a "lite" version of a license that says "this software MIT licensed, can not be re-licensed, and when used with GPL software, can only be linked into the GPL software." GPL-- But, until such time as a GPL vaccine is desirable and the economics of the "GPL support model leads to terrible software" theory is proven, MIT license it is. :) -sc -- Sean Chittenden From sean at gigave.com Mon Dec 19 14:01:04 2005 From: sean at gigave.com (Sean Chittenden) Date: Mon, 19 Dec 2005 11:01:04 -0800 Subject: [libxml-devel] Fwd: Parsing xhtml with libxml In-Reply-To: References: <9e4733910512161518m71324a06j3fc2687689e286e8@mail.gmail.com> <43A6B26A.4020705@adelphia.net> Message-ID: <20051219190104.GC2776@mailhost.gigave.com> > >> I've been quietly picking up patches and the like, and playing > >> around a bit with libxml. I received the patch I forwarded here > >> today, and have also patched updates for GCC 4.0 and so > >> forth. With your agreement, I'd like to get libxml rolling again > >> by working toward a 'ground zero' release (just really > >> incorporating the contributed patches). I am aware of the need to > >> get a workable development plan together but right now I feel > >> it's most important to get working code out to the users, and I > >> have the time to do it at the moment. > >> > >> Any objections to this? > > > > No objections here. I like the idea of the "ground zero" release. Ross, > > why don't you go ahead and take the lead in getting your codebase + > > patches into CVS. I would suggest we create a libxml module in CVS (do > > we need libxsl as well?). > > Until now I've pretty much ignored libxslt, thinking we should > concentrate on getting xml out in a stable and usable form. xsl has > more failing tests, but they seem more superficial for the most > part. What I thought was get xml into CVS and work on the few things > there are, plus try to get a heads up from other platforms. I think > it's most important that we get it compiling on OSX too. My desktop as of 5 days ago is now OS-X, I'm all for getting this to work on OS-X and will work on that. It used to work on my powerbook once upon a time, but... > Once we get that up and out the door, we could then start on xsl, > pretty much doing the same thing to get a stable release out > quickly. The feeling I get from the past few days is that xml lready > has a bit of a community around it, and it certainly seems more > generally useful to me than xsl (but that could be just because I > don't use XSL myself ;) ). Also, of course, xsl depends on xml so it > makes sense to start bottom up I think. XSL is a trivial module by comparison to libxml. libxml suffers from API bloat and is rather expansive in terms of the API it presents and we need to translate to Ruby. -sc -- Sean Chittenden From rosco at roscopeco.co.uk Mon Dec 19 14:03:42 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Mon, 19 Dec 2005 19:03:42 -0000 Subject: [libxml-devel] Fwd: Parsing xhtml with libxml In-Reply-To: <20051219185044.56C5E216FF8@mailhost.gigave.com> References: <9e4733910512161518m71324a06j3fc2687689e286e8@mail.gmail.com> <43A6B26A.4020705@adelphia.net> <20051219185044.56C5E216FF8@mailhost.gigave.com> Message-ID: On Mon, 19 Dec 2005 18:50:43 -0000, Sean Chittenden wrote: > CVS or SVN? I dislike both but think svn sucks the least. I'm for CVS. Subversion doesn't like me very much. -- Ross Bamford - rosco at roscopeco.co.uk From rosco at roscopeco.co.uk Mon Dec 19 14:07:42 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Mon, 19 Dec 2005 19:07:42 -0000 Subject: [libxml-devel] Fwd: Parsing xhtml with libxml In-Reply-To: <43A6F923.5020506@adelphia.net> References: <9e4733910512161518m71324a06j3fc2687689e286e8@mail.gmail.com> <43A6B26A.4020705@adelphia.net> <43A6F923.5020506@adelphia.net> Message-ID: On Mon, 19 Dec 2005 18:17:07 -0000, Bob Showalter wrote: > > I'll be happy to test on Windows and FreeBSD with gcc 3.4 toolchain. I > can also test on HP-UX. I don't have OSX. > Cool. When you test it, could you capture Make's output and post it, so I can check out some possible changes between 3 and 4? It'll be in CVS/SVN pretty much as soon as we choose one or the other. -- Ross Bamford - rosco at roscopeco.co.uk From rosco at roscopeco.co.uk Mon Dec 19 14:12:52 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Mon, 19 Dec 2005 19:12:52 -0000 Subject: [libxml-devel] Fwd: Re: Ruby and XML, validation and DOM In-Reply-To: <20051219185824.83DCD216FCE@mailhost.gigave.com> References: <43a56a72$0$343$5fc30a8@news.tiscali.it> <20051219090215.37AAE216FCE@mailhost.gigave.com> <20051219185824.83DCD216FCE@mailhost.gigave.com> Message-ID: On Mon, 19 Dec 2005 18:58:23 -0000, Sean Chittenden wrote: > > I've caved and just use the MIT license too. For a while I was > hunting for a viral license that says, "this software MIT licensed, > can not be re-licensed or forked, and can't be used in GPL projects, > but can be used in commercial products as one would expect with the > MIT license" and a "lite" version of a license that says "this > software MIT licensed, can not be re-licensed, and when used with GPL > software, can only be linked into the GPL software." GPL-- > > But, until such time as a GPL vaccine is desirable and the economics > of the "GPL support model leads to terrible software" theory is > proven, MIT license it is. :) -sc > :D That's why I stay out out of the legal issues. I'll drop a copy of the MIT in there with the same copyright as the old license. -- Ross Bamford - rosco at roscopeco.co.uk From rosco at roscopeco.co.uk Mon Dec 19 14:14:39 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Mon, 19 Dec 2005 19:14:39 -0000 Subject: [libxml-devel] Fwd: Parsing xhtml with libxml In-Reply-To: References: <9e4733910512161518m71324a06j3fc2687689e286e8@mail.gmail.com> <43A6B26A.4020705@adelphia.net> <20051219185044.56C5E216FF8@mailhost.gigave.com> Message-ID: On Mon, 19 Dec 2005 19:03:42 -0000, Ross Bamford wrote: > On Mon, 19 Dec 2005 18:50:43 -0000, Sean Chittenden > wrote: > >> CVS or SVN? I dislike both but think svn sucks the least. > > I'm for CVS. Subversion doesn't like me very much. > That said, though, if you still prefer SVN I don't really mind. I may have some basic questions, though, when it starts playing it's tricks again... ;) -- Ross Bamford - rosco at roscopeco.co.uk From rosco at roscopeco.co.uk Mon Dec 19 14:17:21 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Mon, 19 Dec 2005 19:17:21 -0000 Subject: [libxml-devel] Fwd: Parsing xhtml with libxml In-Reply-To: <20051219190106.152FA216FF8@mailhost.gigave.com> References: <9e4733910512161518m71324a06j3fc2687689e286e8@mail.gmail.com> <43A6B26A.4020705@adelphia.net> <20051219190106.152FA216FF8@mailhost.gigave.com> Message-ID: On Mon, 19 Dec 2005 19:01:04 -0000, Sean Chittenden wrote: > > My desktop as of 5 days ago is now OS-X, I'm all for getting this to > work on OS-X and will work on that. It used to work on my powerbook > once upon a time, but... > Cool, I think a lot of people have gone Mac these days, so it's important we can support it :) > XSL is a trivial module by comparison to libxml. libxml suffers from > API bloat and is rather expansive in terms of the API it presents and > we need to translate to Ruby. -sc > I've noted out a few ideas for this kind of thing in TODOs in the code. I too think that libxml's naming convention blows monkey ass ;) -- Ross Bamford - rosco at roscopeco.co.uk From sean at gigave.com Mon Dec 19 15:08:53 2005 From: sean at gigave.com (Sean Chittenden) Date: Mon, 19 Dec 2005 12:08:53 -0800 Subject: [libxml-devel] Fwd: Parsing xhtml with libxml In-Reply-To: References: <9e4733910512161518m71324a06j3fc2687689e286e8@mail.gmail.com> <43A6B26A.4020705@adelphia.net> <20051219185044.56C5E216FF8@mailhost.gigave.com> Message-ID: <20051219200853.GD2776@mailhost.gigave.com> > >> CVS or SVN? I dislike both but think svn sucks the least. > > > > I'm for CVS. Subversion doesn't like me very much. > > That said, though, if you still prefer SVN I don't really mind. I > may have some basic questions, though, when it starts playing it's > tricks again... ;) The only reason I like svn at the moment more than cvs is that I can do offline diff's on my laptop, which lets me work on this kinda stuff on planes. :) I'll be more than happy to answer any Q's that folk have though before we commit to one or the other. -sc -- Sean Chittenden From rosco at roscopeco.co.uk Mon Dec 19 15:18:27 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Mon, 19 Dec 2005 20:18:27 -0000 Subject: [libxml-devel] Fwd: Parsing xhtml with libxml In-Reply-To: <20051219200855.06485217008@mailhost.gigave.com> References: <9e4733910512161518m71324a06j3fc2687689e286e8@mail.gmail.com> <43A6B26A.4020705@adelphia.net> <20051219185044.56C5E216FF8@mailhost.gigave.com> <20051219200855.06485217008@mailhost.gigave.com> Message-ID: On Mon, 19 Dec 2005 20:08:53 -0000, Sean Chittenden wrote: > The only reason I like svn at the moment more than cvs is that I can > do offline diff's on my laptop, which lets me work on this kinda stuff > on planes. :) I'll be more than happy to answer any Q's that folk > have though before we commit to one or the other. -sc > Aww, hell. I just committed to CVS. I'm putting a message together with some notes about it. What I was going to say, though, was that it appears we can switch anyway, so I just thought to get this in some kind of repository for now. Sorry about that. With SVN, I've not had that much experience - it's as simple as that. I 'get' CVS, and am used to it. When I've used subversion before it's surprised me in a few ways, which I don't like when those mistakes are version controlled for all to see ;) Branching and stuff (specifically, merging between branches) with SVN gets me in a mess, so I'd have to stay away from all of that kind of stuff for a while ... -- Ross Bamford - rosco at roscopeco.co.uk From rosco at roscopeco.co.uk Mon Dec 19 15:29:43 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Mon, 19 Dec 2005 20:29:43 -0000 Subject: [libxml-devel] Fwd: Parsing xhtml with libxml In-Reply-To: References: <9e4733910512161518m71324a06j3fc2687689e286e8@mail.gmail.com> <43A6B26A.4020705@adelphia.net> <43A6F923.5020506@adelphia.net> Message-ID: On Mon, 19 Dec 2005 19:07:42 -0000, I wrote: > > It'll be in CVS/SVN pretty much as soon as we choose one or the other. > Module name is 'libxml' Some notes: * There are three excluded tests, one pertaining to parser context num_chars, and two around xpointer/xpath ranges. The former I'm not sure about - I don't know what the context's num chars is supposed to be. The test expects 17, the answer is 42. You can include these tests in a run with: rake test NOTWORKING=true * I modified one expectation, since from reading the spec it seems to me that a xpointer expression id('two') should give the element two rather than either it's ID or it's content (which I think was the original expectation). Another way to look at this, is should Node#to_s on give back just foo's content, or a string representation of the whole element (as now)? * There's a Rakefile that will generate RDoc and run the tests. It will also handle running extconf and building the so (with make) if necessary. I think we should retain the C dependencies in Make, because Gems expects that I believe. * I moved a few of the tests around, and got rid of the 'Outdated' stuff (retaining the files used in the tests in tests/model). * There are a fair few TODOs in the code, as well as some cursory observations I made about rubyisms, problems, and so so on. * I applied the patch I forwarded here the other day, and also patches from: http://www.intertwingly.net/blog/2005/11/05/Patch-for-libxml2s-Ruby-binding with a few minor modifications. This includes removing some lvalue casts (GCC hates them now), but I have a feeling my rusty C may be missing a subtlety here? Again, sorry for jumping the gun with this, I was just about to commit when you raised the question and wanted to get it out to you guys quickly. -- Ross Bamford - rosco at roscopeco.co.uk From rosco at roscopeco.co.uk Wed Dec 21 20:56:48 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Thu, 22 Dec 2005 01:56:48 -0000 Subject: [libxml-devel] Memprof ruby? and GCC 4 Message-ID: Hi, I've been having 'fun' today scrutinising a lot of GCC literature and the like, and trying to find the best way to do some memory profiling on this code. This is where I'm at: * I can't get Libxml2 2.6.22 to compile from source with GCC 4, and I am too long out of the C tooling scene that I can't figure out why. It's not a compile problem per se, but rather than some stuff seems to have changed and broken the autoconf. So at present, I can't compile libxml with memory debugging enabled. * Ruby memory profiling isn't too well supported? I have right now a (ruby-side) hack I found on ruby-talk, but it's not too helpful and I could do with a few pointers to any other useful stuff that might help? I'm still getting a lot of chatter during compilation, so I'm not really sure what kind of baseline I have. Tomorrow I am going to set up a spare machine with a gcc 3 based distribution and try to get some bearings from there. After that, my time's probably up until after xmas (pesky family ... ;D). I read somewhere about a GC.always feature (name/impl to be decided) that's going into 1.9, mainly for extension developers, which from the looks of things would make this kind of thing much easier to deal with. I think I should be able to hook in already somehow for alloc and collection, but I don't have any kind of handle on Ruby from C yet really... On another note, when I was collecting patches I didn't apply a couple of feature patches - only immediate fixes. I'm thinking of putting those outstanding in the tracker so we can make decisions based on stable code later on. -- Ross Bamford - rosco at roscopeco.remove.co.uk From sean at gigave.com Fri Dec 23 00:35:52 2005 From: sean at gigave.com (Sean Chittenden) Date: Thu, 22 Dec 2005 21:35:52 -0800 Subject: [libxml-devel] Memprof ruby? and GCC 4 In-Reply-To: References: Message-ID: <20051223053552.GE11127@mailhost.gigave.com> > I've been having 'fun' today scrutinising a lot of GCC literature > and the like, and trying to find the best way to do some memory > profiling on this code. This is where I'm at: Define memory profiling, as in performance and page hits, or debugging as in valgrind? > * I can't get Libxml2 2.6.22 to compile from source with GCC > 4, and I am too long out of the C tooling scene that I can't > figure out why. It's not a compile problem per se, but rather > than some stuff seems to have changed and broken the > autoconf. So at present, I can't compile libxml with memory > debugging enabled. If you have error messages, toss'em here and I'll debug. Starting on the 24th, I'm out of the office and will have time to work on this kinda stuff. > * Ruby memory profiling isn't too well supported? I have right > now a (ruby-side) hack I found on ruby-talk, but it's not too > helpful and I could do with a few pointers to any other useful > stuff that might help? The phrase "fugly" comes to mind with regards to profiling, but specifics in terms of what you're looking for would be helpful. The "shadow" objects and pseudo refcount system that's in place is pretty brain-damaged and in need of some TLC. > I'm still getting a lot of chatter during compilation, so I'm not > really sure what kind of baseline I have. Tomorrow I am going to set > up a spare machine with a gcc 3 based distribution and try to get > some bearings from there. After that, my time's probably up until > after xmas (pesky family ... ;D). Getting this to work on gcc4 is important since GCC 4 is much more strict about what's legal and what's not. Chances are, gcc4's pointing out some pretty valid bugs/problems. > I read somewhere about a GC.always feature (name/impl to be decided) > that's going into 1.9, mainly for extension developers, which from > the looks of things would make this kind of thing much easier to > deal with. I think I should be able to hook in already somehow for > alloc and collection, but I don't have any kind of handle on Ruby > from C yet really... Ask away, I've done several extensions at this point. > On another note, when I was collecting patches I didn't apply a > couple of feature patches - only immediate fixes. I'm thinking of > putting those outstanding in the tracker so we can make decisions > based on stable code later on. Post'em as diff(1) -u'ed files for review. -sc -- Sean Chittenden From showaltb at gmail.com Thu Dec 29 14:47:40 2005 From: showaltb at gmail.com (Bob Showalter) Date: Thu, 29 Dec 2005 14:47:40 -0500 Subject: [libxml-devel] Project web page started Message-ID: <43B43D5C.6000402@gmail.com> I've created a *very* rudimentary web page at http://libxml.rubyforge.org. Any suggestions for embellishments or additional information are more than welcome. The rote project is in CVS under 'www' I also added a 'pubdoc' task to the libxml Rakefile to publish the generated rdoc to http://libxml.rubyforge.org/doc Also the source code compiles fine under FreeBSD 5.3 with gcc 3.4.2; looking at the tests now... Bob From rosco at roscopeco.co.uk Fri Dec 23 10:03:01 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Fri, 23 Dec 2005 15:03:01 -0000 Subject: [libxml-devel] Memprof ruby? and GCC 4 In-Reply-To: <20051223053553.AC6EA216E04@mailhost.gigave.com> References: <20051223053553.AC6EA216E04@mailhost.gigave.com> Message-ID: (Apologies for long message) Hi, On Fri, 23 Dec 2005 05:35:52 -0000, Sean Chittenden wrote: >> I've been having 'fun' today scrutinising a lot of GCC literature >> and the like, and trying to find the best way to do some memory >> profiling on this code. This is where I'm at: > > Define memory profiling, as in performance and page hits, or debugging > as in valgrind? > ;) The latter. Thx. Now I just need learn to use it. I've run a quick leak check to try it out and am attaching the reports, though they may be useless since I don't know what I'm doing really ;) It's just that 'definitely lost' bit at the bottom that mentions ruby_xml_document... >> * I can't get Libxml2 2.6.22 to compile from source with GCC >> 4, and I am too long out of the C tooling scene that I can't >> figure out why. It's not a compile problem per se, but rather >> than some stuff seems to have changed and broken the >> autoconf. So at present, I can't compile libxml with memory >> debugging enabled. > > If you have error messages, toss'em here and I'll debug. Starting on > the 24th, I'm out of the office and will have time to work on this > kinda stuff. > I'm going to hold off on recompiling libxml now, since I think I may be able to get away without it. The binaries I have for Libxml have debugging enabled, but not memory debugging, but maybe with valgrind it's not needed. I'm also concerned it's something that's broken here, since I can't find any references to the problem anywhere :( >> * Ruby memory profiling isn't too well supported? I have right >> now a (ruby-side) hack I found on ruby-talk, but it's not too >> helpful and I could do with a few pointers to any other useful >> stuff that might help? > > The phrase "fugly" comes to mind with regards to profiling, but > specifics in terms of what you're looking for would be helpful. The > "shadow" objects and pseudo refcount system that's in place is pretty > brain-damaged and in need of some TLC. > This is what I've figured so far. Mostly I need to get refcounts, byte sizes aren't important. It'd be nice to be there for each GC, but not really necessary. I'm zeroing in on a problem with (I believe) node sets that causes a segfault at Ruby exit. What I believe is happening is something like: * Run a range XPointer, makes an XPath, makes a nodeset. * (maybe; some temporary nodeset is freed?) * Work with nodeset, finish with it * (or; this nodeset is freed? either way the XPath keeps it's ref I guess) * Something hangs around, prevented from being GCed for some reason.* * (Ruby shutdown) GC runs, tries to free already freed nodeset, goes wrong. (*not sure if it's being prevented, or just not getting a chance until exit - need a more in-depth testcase for it). If I'm right then I think this is also a likely culprit for the underlying problem with the general nodeset usage I mentioned in my previous post. This is a little difficult to reproduce, but I'm working on a reliably unreliable testcase for it so I can see what's happening. It happens often enough that I'd probably consider it critical as far as XPointer support is concerned if I could verify it's not just a local problem. >> I'm still getting a lot of chatter during compilation, so I'm not >> really sure what kind of baseline I have. Tomorrow I am going to set >> up a spare machine with a gcc 3 based distribution and try to get >> some bearings from there. After that, my time's probably up until >> after xmas (pesky family ... ;D). > > Getting this to work on gcc4 is important since GCC 4 is much more > strict about what's legal and what's not. Chances are, gcc4's > pointing out some pretty valid bugs/problems. > My 'spare machine' idea isn't going to fly anyway, so it's GCC 4 anyway I guess :) The only actual errors I had (going on the code I got from Trans) were about a couple of lvalue casts, but I'm getting a *lot* of 'differ in signedness' warnings that my brain says to dismiss, but that ring alarm bells somewhere anyway. It seems to be passing Ruby VALUEs around that causes it, so maybe it's just one of those 'expected PITA' things? Or a local misconfiguration? >> I read somewhere about a GC.always feature (name/impl to be decided) >> that's going into 1.9, mainly for extension developers, which from >> the looks of things would make this kind of thing much easier to >> deal with. I think I should be able to hook in already somehow for >> alloc and collection, but I don't have any kind of handle on Ruby >> from C yet really... > > Ask away, I've done several extensions at this point. > Thanks :) I'm still in 'dumb question' territory yet, and I'm trying to cut down on those for christmas :D Mostly I'm hitting my head on nuances of C programming I'd long forgotten, but to be honest it's pretty nice to be back at that level. Once I'm a bit more sure it's not all my fault I'll have some intelligent questions (hopefully ;)). One thing I do wonder, is if there's more to those lvalue casts I patched away than meets the eye. I applied the patch I mentioned which seemed fine at the time (if slightly redundant with a double cast?) but now with the problems that are appearing I'm concerned there's more to it. An example is (extracted from node_set.c, this isn't actual code): void *data; // was // (rx_xpath_data *)data = (rx_xpath_data *)rxnset->data; // now data = (void*)(rx_xpath_data *)rxnset->data; free((rx_xpath_data *)data); It seems to me that the (void*) is redundant, but it should work fine? However, I don't really get how/why lvalue casts are used with pointers, so I'm worried I may have blitzed something important here :/ Also, I don't know the ins-and-outs of ruby GC so I'm in the dark as to what we can just free and what should be registered with Ruby - any good links on this stuff, or pointers to any code I might follow from other projects / ruby itself that'd be good to learn it from, would be _much_ appreciated. (If it makes any difference, I think ruby.h checks that sizeof void* == sizeof unsigned long) >> On another note, when I was collecting patches I didn't apply a >> couple of feature patches - only immediate fixes. I'm thinking of >> putting those outstanding in the tracker so we can make decisions >> based on stable code later on. > > Post'em as diff(1) -u'ed files for review. -sc > Okay, I'll hold them for now until we've got a reasonably 'known good' codebase, then diff them from that. Apologies for my lack of foresight with all this, I kind of assume it's like riding a bike but it's always taken me extra time to get back into C. There's just so much going on in every line, and I think Java actually damages my programming skills in this regard :( Don't worry, though - I'm a sink or swim learner. Cheers, Ross Btw, Merry xmas, one and all :) -- Ross Bamford - rosco at roscopeco.co.uk -------------- next part -------------- A non-text attachment was scrubbed... Name: all-test.err Type: application/octet-stream Size: 72500 bytes Desc: not available Url : http://rubyforge.org/pipermail/libxml-devel/attachments/20051223/98ae252c/all-test-0002.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: all-test.out Type: application/octet-stream Size: 685 bytes Desc: not available Url : http://rubyforge.org/pipermail/libxml-devel/attachments/20051223/98ae252c/all-test-0003.obj From sean at gigave.com Thu Dec 29 16:55:58 2005 From: sean at gigave.com (Sean Chittenden) Date: Thu, 29 Dec 2005 13:55:58 -0800 Subject: [libxml-devel] Memprof ruby? and GCC 4 In-Reply-To: References: <20051223053553.AC6EA216E04@mailhost.gigave.com> Message-ID: <20051229215558.GI28159@mailhost.gigave.com> > (Apologies for long message) A long, technical discourse on a -devel list? Never! Say it ain't so! *grin* > >>I've been having 'fun' today scrutinising a lot of GCC literature > >>and the like, and trying to find the best way to do some memory > >>profiling on this code. This is where I'm at: > > > >Define memory profiling, as in performance and page hits, or > >debugging as in valgrind? > > ;) The latter. Thx. Now I just need learn to use it. I've run a > quick leak check to try it out and am attaching the reports, though > they may be useless since I don't know what I'm doing really ;) It's > just that 'definitely lost' bit at the bottom that mentions > ruby_xml_document... valgrind and ruby don't get along so well... I haven't tried very recently, but Ruby does some tricks that valgrind wasn't able to follow and it generated a lot of false positives... not to say Valigrind hasn't improved, but there used to be problems with this once upon a time. > >> * I can't get Libxml2 2.6.22 to compile from source with GCC > >> 4, and I am too long out of the C tooling scene that I can't > >> figure out why. It's not a compile problem per se, but rather > >> than some stuff seems to have changed and broken the > >> autoconf. So at present, I can't compile libxml with memory > >> debugging enabled. > > > >If you have error messages, toss'em here and I'll debug. Starting on > >the 24th, I'm out of the office and will have time to work on this > >kinda stuff. > > I'm going to hold off on recompiling libxml now, since I think I may > be able to get away without it. The binaries I have for Libxml have > debugging enabled, but not memory debugging, but maybe with valgrind > it's not needed. I'm also concerned it's something that's broken > here, since I can't find any references to the problem anywhere :( The memory debugging in libxml2(3) is really cool... simple, but very, very cool. It just outputs a list of data (addresses, the bytes, and the action - free, malloc, realloc, etc) to a file that can be processed to see what libxml is doing. Very, very useful and a great overall debugging technique in general. > >> * Ruby memory profiling isn't too well supported? I have right > >> now a (ruby-side) hack I found on ruby-talk, but it's not too > >> helpful and I could do with a few pointers to any other useful > >> stuff that might help? > > > >The phrase "fugly" comes to mind with regards to profiling, but > >specifics in terms of what you're looking for would be helpful. > >The "shadow" objects and pseudo refcount system that's in place is > >pretty brain-damaged and in need of some TLC. > > This is what I've figured so far. Mostly I need to get refcounts, > byte sizes aren't important. It'd be nice to be there for each GC, > but not really necessary. I'm zeroing in on a problem with (I > believe) node sets that causes a segfault at Ruby exit. What I > believe is happening is something like: > > * Run a range XPointer, makes an XPath, makes a nodeset. > * (maybe; some temporary nodeset is freed?) > * Work with nodeset, finish with it > * (or; this nodeset is freed? either way the XPath keeps it's ref I > guess) > * Something hangs around, prevented from being GCed for some reason.* > * (Ruby shutdown) GC runs, tries to free already freed nodeset, goes > wrong. > > (*not sure if it's being prevented, or just not getting a chance until > exit - need a more in-depth testcase for it). This sounds like a problem with the mark/sweep of libxml ruby objects... likely a bug that I introduced and never resolved (or triggered). > If I'm right then I think this is also a likely culprit for the > underlying problem with the general nodeset usage I mentioned in my > previous post. This is a little difficult to reproduce, but I'm > working on a reliably unreliable testcase for it so I can see what's > happening. It happens often enough that I'd probably consider it > critical as far as XPointer support is concerned if I could verify > it's not just a local problem. If you have a test case, I can debug it and commit a fix. As I'd eluded to earlier, I'm not particularly happy with some of the specifics for how I handled objects in libxml and would likely want to revisit with a fresh new approach instead of using my... mmm... hastily thunk out implementation. At the time, I was just grinding out a ton of code and was less interested in the nitty grittys of making it a great solution simply due to some timelines that I had at the time. I remember two things: 1) trying to avoid relying on the Ruby GC where possible simply because it was so slow compared to manual management, but I've gotten over that (where "so slow" really translates to only a few percent in terms of performance). 2) Consolidating pointers to nodes so that writing and querying of a given document would work "correctly". I'm not sure I ever completed or got this bit of code working correctly, so who knows where it stands... with 20/20 vision, I think I'd rather just make it known that there potentially could be two instances of the same node unless you're careful. We'll see once I revisit some of this. I guess I just need a kick in the ass to work on this stuff again given I'm hacking together some stuff right now that involves cat(1), xmllint(1), xsltproc(1), grep(1), and a few other text bits that shouldn't be employed... > >>I'm still getting a lot of chatter during compilation, so I'm not > >>really sure what kind of baseline I have. Tomorrow I am going to > >>set up a spare machine with a gcc 3 based distribution and try to > >>get some bearings from there. After that, my time's probably up > >>until after xmas (pesky family ... ;D). > > > >Getting this to work on gcc4 is important since GCC 4 is much more > >strict about what's legal and what's not. Chances are, gcc4's > >pointing out some pretty valid bugs/problems. > > My 'spare machine' idea isn't going to fly anyway, so it's GCC 4 > anyway I guess :) The only actual errors I had (going on the code I > got from Trans) were about a couple of lvalue casts, but I'm getting > a *lot* of 'differ in signedness' warnings that my brain says to > dismiss, but that ring alarm bells somewhere anyway. It seems to be > passing Ruby VALUEs around that causes it, so maybe it's just one of > those 'expected PITA' things? Or a local misconfiguration? Ehh... gcc4 really did a nice job of starting to clean up some of this stuff and has caused a great number of headaches that the OSS community haven't caught up to quite yet. > >>I read somewhere about a GC.always feature (name/impl to be > >>decided) that's going into 1.9, mainly for extension developers, > >>which from the looks of things would make this kind of thing much > >>easier to deal with. I think I should be able to hook in already > >>somehow for alloc and collection, but I don't have any kind of > >>handle on Ruby from C yet really... > > > >Ask away, I've done several extensions at this point. > > Thanks :) I'm still in 'dumb question' territory yet, and I'm trying > to cut down on those for christmas :D You picked a hell of an extension to start with... then again, this was my 1st extension too, so I'm not particularly judgmental and will freely admit to being a dumbshit or making mistakes... but only for this extension, everything else I do is perfect and always flawless... err.. *grin* > Mostly I'm hitting my head on nuances of C programming I'd long > forgotten, but to be honest it's pretty nice to be back at that > level. Once I'm a bit more sure it's not all my fault I'll have some > intelligent questions (hopefully ;)). And even if you don't, I'd rather you ask and keep the energy levels associated with this project higher rather than lower because as interest wanes, so does involvement and that's what lead to this code being shelved for a while. > One thing I do wonder, is if there's more to those lvalue casts I > patched away than meets the eye. I applied the patch I mentioned > which seemed fine at the time (if slightly redundant with a double > cast?) but now with the problems that are appearing I'm concerned > there's more to it. An example is (extracted from node_set.c, this > isn't actual code): > > void *data; > > // was > // (rx_xpath_data *)data = (rx_xpath_data *)rxnset->data; > > // now > data = (void*)(rx_xpath_data *)rxnset->data; > free((rx_xpath_data *)data); > > It seems to me that the (void*) is redundant, but it should work > fine? OMG! I really did do some horrible things... a special part of hell is starting to warm up for me after seeing this example... ugh. wait! Maybe I shouldn't be in the extra crispy section quite yet... I seem to recall this being the way I fixed cyclic dependencies in structures. Casts to void * are a hairy subject and should be used as a hack, which I did... this should be revisited. rxnset->data shouldn't be a void * and I think only was a void * due to cyclical dependencies for #include's, actually. ie: struct a { struct a *aptr; struct b *bptr; }; struct b { struct a *aptr; struct b *bptr; }; So I worked around it with void*. > However, I don't really get how/why lvalue casts are used with > pointers, so I'm worried I may have blitzed something important here > :/ Also, I don't know the ins-and-outs of ruby GC so I'm in the dark > as to what we can just free and what should be registered with Ruby > - any good links on this stuff, or pointers to any code I might > follow from other projects / ruby itself that'd be good to learn it > from, would be _much_ appreciated. Eh, this code is probably not going to help you decipher the mysteries of mark/sweep GC's since I'm sure I made a mess of things by trying to avoid the ruby gc where possible. > >>On another note, when I was collecting patches I didn't apply a > >>couple of feature patches - only immediate fixes. I'm thinking of > >>putting those outstanding in the tracker so we can make decisions > >>based on stable code later on. > > > >Post'em as diff(1) -u'ed files for review. -sc > > Okay, I'll hold them for now until we've got a reasonably 'known > good' codebase, then diff them from that. > > Apologies for my lack of foresight with all this, I kind of assume > it's like riding a bike but it's always taken me extra time to get > back into C. There's just so much going on in every line, and I > think Java actually damages my programming skills in this regard :( > Don't worry, though - I'm a sink or swim learner. Heh. Yeah, Ruby/C is a bit dense compared to Java, but Ruby/C and its macros do a ton of work so there's not much need for a lot of the complexity that you'd run into using Java. -sc -- Sean Chittenden From pat.eyler at gmail.com Thu Dec 29 17:26:48 2005 From: pat.eyler at gmail.com (pat eyler) Date: Thu, 29 Dec 2005 15:26:48 -0700 Subject: [libxml-devel] Memprof ruby? and GCC 4 In-Reply-To: <20051229215558.GI28159@mailhost.gigave.com> References: <20051223053553.AC6EA216E04@mailhost.gigave.com> <20051229215558.GI28159@mailhost.gigave.com> Message-ID: <6fd0654b0512291426t7619c5d1qd8774103432ecdcc@mail.gmail.com> On 12/29/05, Sean Chittenden wrote: > > (Apologies for long message) > > A long, technical discourse on a -devel list? Never! Say it ain't > so! *grin* > > > >>I've been having 'fun' today scrutinising a lot of GCC literature > > >>and the like, and trying to find the best way to do some memory > > >>profiling on this code. This is where I'm at: > > > > > >Define memory profiling, as in performance and page hits, or > > >debugging as in valgrind? > > > > ;) The latter. Thx. Now I just need learn to use it. I've run a > > quick leak check to try it out and am attaching the reports, though > > they may be useless since I don't know what I'm doing really ;) It's > > just that 'definitely lost' bit at the bottom that mentions > > ruby_xml_document... > > valgrind and ruby don't get along so well... I haven't tried very > recently, but Ruby does some tricks that valgrind wasn't able to > follow and it generated a lot of false positives... not to say > Valigrind hasn't improved, but there used to be problems with this > once upon a time. Are you using suppressions to avoid the false positives? I've got a suppression file (it's in need of serious refactoring -- I harvested poorly named suppressions and haven't bothered renaming them) that I've attached. I was using it while I worked with zenspider and the metaruby folks on building tests. You use it like this: valgrind --suppressions=supp --tool=memcheck ruby TestFloat.ts [deleted] > > > I'm going to hold off on recompiling libxml now, since I think I may > > be able to get away without it. The binaries I have for Libxml have > > debugging enabled, but not memory debugging, but maybe with valgrind > > it's not needed. I'm also concerned it's something that's broken > > here, since I can't find any references to the problem anywhere :( > > The memory debugging in libxml2(3) is really cool... simple, but very, > very cool. It just outputs a list of data (addresses, the bytes, and > the action - free, malloc, realloc, etc) to a file that can be > processed to see what libxml is doing. Very, very useful and a great > overall debugging technique in general. > > [deleted] > > >Getting this to work on gcc4 is important since GCC 4 is much more > > >strict about what's legal and what's not. Chances are, gcc4's > > >pointing out some pretty valid bugs/problems. I'm starting to think that there's a hidden need here. Between occasional oddities between Ruby releases and popular packages (like Rake), and with the accelerating progress on GCC, maybe it makes sense to start watching things more closely. Andy Lester runs a the Phalanx project, which runsthe testsuites of popular libraries on pre-release versions of Perl to verify that they'll work correctly. Does it make sense to try do something similar with Ruby (and probably Ruby built against against gcc 4.0, 4.1 pre releases, and 4.2 pre releases)? [deleted] > > Heh. Yeah, Ruby/C is a bit dense compared to Java, but Ruby/C and its > macros do a ton of work so there's not much need for a lot of the > complexity that you'd run into using Java. Yeah, I've got a lot of learning/remembering to do. > > -sc > > -- > Sean Chittenden > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > -- thanks, -pate ------------------------- From rosco at roscopeco.co.uk Thu Dec 29 19:24:21 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Fri, 30 Dec 2005 00:24:21 -0000 Subject: [libxml-devel] Project web page started In-Reply-To: <43B43D5C.6000402@gmail.com> References: <43B43D5C.6000402@gmail.com> Message-ID: On Thu, 29 Dec 2005 19:47:40 -0000, Bob Showalter wrote: > I've created a *very* rudimentary web page at > http://libxml.rubyforge.org. Any suggestions for embellishments or > additional information are more than welcome. > > The rote project is in CVS under 'www' > > I also added a 'pubdoc' task to the libxml Rakefile to publish the > generated rdoc to http://libxml.rubyforge.org/doc > Nice work, I'll set up a redirect at that temporary documentation URL I posted before. > Also the source code compiles fine under FreeBSD 5.3 with gcc 3.4.2; > looking at the tests now... > Cool. I hoped to supply a failing case to demonstrate the memory bug but I can't reproduce it now (explained in reply to Sean). All tests should pass with 'rake test' while 'rake test NOTWORKING=' should give you three failures, which we need to look into I guess. I'm assuming now that you didn't see lots of 'differ in signedness' when compiling? Cheers, -- Ross Bamford - rosco at roscopeco.co.uk From rosco at roscopeco.co.uk Thu Dec 29 19:25:02 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Fri, 30 Dec 2005 00:25:02 -0000 Subject: [libxml-devel] Memprof ruby? and GCC 4 In-Reply-To: <20051229215559.AB7D72195A6@mailhost.gigave.com> References: <20051223053553.AC6EA216E04@mailhost.gigave.com> <20051229215559.AB7D72195A6@mailhost.gigave.com> Message-ID: On Thu, 29 Dec 2005 21:55:58 -0000, Sean Chittenden wrote: >> (Apologies for long message) > > A long, technical discourse on a -devel list? Never! Say it ain't > so! *grin* > :D > valgrind and ruby don't get along so well... I haven't tried very > recently, but Ruby does some tricks that valgrind wasn't able to > follow and it generated a lot of false positives... not to say > Valigrind hasn't improved, but there used to be problems with this > once upon a time. > I've noted a few false positives (or maybe a lot) coming out of Ruby itself but I have a hunch it may be worth checking the document_find leak it came up with, because that method did seem to be involved in a lot of the problematic stuff I was seeing at runtime. Now, that said, I have to admit, I can't seem to reproduce the segfault-at-exit problem I was seeing with Ruby 1.8.3 on either 1.8.4 or a 1.9 snapshot from just before christmas. The common theme I think is that these are both locally compiled for i686, while I didn't notice that my (platform binary) 1.8.3 was compiled for i386 until after I started investigating. When everything is compiled for the same arch it seems fine (though there is still Trans' memory leak todo outstanding). >> >> * I can't get Libxml2 2.6.22 to compile from source with GCC >> >> 4, and I am too long out of the C tooling scene that I can't >> >> figure out why. It's not a compile problem per se, but rather >> >> than some stuff seems to have changed and broken the >> >> autoconf. So at present, I can't compile libxml with memory >> >> debugging enabled. >> > >> >If you have error messages, toss'em here and I'll debug. Starting on >> >the 24th, I'm out of the office and will have time to work on this >> >kinda stuff. >> >> I'm going to hold off on recompiling libxml now, since I think I may >> be able to get away without it. The binaries I have for Libxml have >> debugging enabled, but not memory debugging, but maybe with valgrind >> it's not needed. I'm also concerned it's something that's broken >> here, since I can't find any references to the problem anywhere :( > > The memory debugging in libxml2(3) is really cool... simple, but very, > very cool. It just outputs a list of data (addresses, the bytes, and > the action - free, malloc, realloc, etc) to a file that can be > processed to see what libxml is doing. Very, very useful and a great > overall debugging technique in general. > That was the picture I got from the doc, so it's pretty infuriating that I can't switch it on :-| It's probably a really basic thing, but ./configure dies with: configure: error: C++ preprocessor "/lib/cpp" fails sanity check See `config.log' for more details." though config.log only seems to show that message, and an environment dump. This is the reason I jumped ship from C pretty early on - not so much the language, which I feel fairly comfortable with, but the tools, and the amount of work involved in getting stuff to work together. Now, though, it feels like a pretty big gap in my knowledge, which I need to fix... >> >> * Run a range XPointer, makes an XPath, makes a nodeset. >> * (maybe; some temporary nodeset is freed?) >> * Work with nodeset, finish with it >> * (or; this nodeset is freed? either way the XPath keeps it's ref I >> guess) >> * Something hangs around, prevented from being GCed for some reason.* >> * (Ruby shutdown) GC runs, tries to free already freed nodeset, goes >> wrong. >> >> (*not sure if it's being prevented, or just not getting a chance until >> exit - need a more in-depth testcase for it). > (This is the bug I can't reproduce now) > As I'd > eluded to earlier, I'm not particularly happy with some of the > specifics for how I handled objects in libxml and would likely want to > revisit with a fresh new approach instead of using my... > mmm... hastily thunk out implementation. At the time, I was just > grinding out a ton of code and was less interested in the nitty > grittys of making it a great solution simply due to some timelines > that I had at the time. I remember two things: > > 1) trying to avoid relying on the Ruby GC where possible simply > because it was so slow compared to manual management, but I've gotten > over that (where "so slow" really translates to only a few percent in > terms of performance). > > 2) Consolidating pointers to nodes so that writing and querying of a > given document would work "correctly". I'm not sure I ever > completed or got this bit of code working correctly, so who knows > where it stands... with 20/20 vision, I think I'd rather just make > it known that there potentially could be two instances of the same > node unless you're careful. We'll see once I revisit some of this. > I guess I just need a kick in the ass to work on this stuff again > given I'm hacking together some stuff right now that involves > cat(1), xmllint(1), xsltproc(1), grep(1), and a few other text bits > that shouldn't be employed... > *kick* Just think how much cleaner it'd be in Ruby... There were a few memory management things I noticed that made me put a 'whats happening here?' type note in a todo, not because they're wrong but because I didn't understand them, so I guess that's probably the GC stuff. Anyway, how about I start writing a few aggressive tests around general memory management and the node copying stuff to see what turns up...? >> >>I'm still getting a lot of chatter during compilation, so I'm not >> >>really sure what kind of baseline I have. Tomorrow I am going to >> >>set up a spare machine with a gcc 3 based distribution and try to >> >>get some bearings from there. After that, my time's probably up >> >>until after xmas (pesky family ... ;D). >> > >> >Getting this to work on gcc4 is important since GCC 4 is much more >> >strict about what's legal and what's not. Chances are, gcc4's >> >pointing out some pretty valid bugs/problems. >> >> My 'spare machine' idea isn't going to fly anyway, so it's GCC 4 >> anyway I guess :) The only actual errors I had (going on the code I >> got from Trans) were about a couple of lvalue casts, but I'm getting >> a *lot* of 'differ in signedness' warnings that my brain says to >> dismiss, but that ring alarm bells somewhere anyway. It seems to be >> passing Ruby VALUEs around that causes it, so maybe it's just one of >> those 'expected PITA' things? Or a local misconfiguration? > > Ehh... gcc4 really did a nice job of starting to clean up some of > this stuff and has caused a great number of headaches that the OSS > community haven't caught up to quite yet. > Okay, good (in that I can ignore it, until it's fixed in Ruby?) >> >>I read somewhere about a GC.always feature (name/impl to be >> >>decided) that's going into 1.9, mainly for extension developers, >> >>which from the looks of things would make this kind of thing much >> >>easier to deal with. I think I should be able to hook in already >> >>somehow for alloc and collection, but I don't have any kind of >> >>handle on Ruby from C yet really... >> > >> >Ask away, I've done several extensions at this point. >> >> Thanks :) I'm still in 'dumb question' territory yet, and I'm trying >> to cut down on those for christmas :D > > You picked a hell of an extension to start with... then again, this > was my 1st extension too, so I'm not particularly judgmental and will > freely admit to being a dumbshit or making mistakes... but only for > this extension, everything else I do is perfect and always > flawless... err.. *grin* > Oh, everything I do sucks. It's remarkably liberating ;D I know I've jumped in the deep-end with this, but I've found that I learn things best by doing something real. I'm a big believer that if I learn the hard way, everything else comes easy. It does mean I make a fool of myself a lot, but I'm used to that :-/ In the meantime I'll of course refrain from any large-scale rewriting ;) >> One thing I do wonder, is if there's more to those lvalue casts I >> patched away than meets the eye. I applied the patch I mentioned >> which seemed fine at the time (if slightly redundant with a double >> cast?) but now with the problems that are appearing I'm concerned >> there's more to it. An example is (extracted from node_set.c, this >> isn't actual code): >> >> void *data; >> >> // was >> // (rx_xpath_data *)data = (rx_xpath_data *)rxnset->data; >> >> // now >> data = (void*)(rx_xpath_data *)rxnset->data; >> free((rx_xpath_data *)data); >> >> It seems to me that the (void*) is redundant, but it should work >> fine? > > OMG! I really did do some horrible things... a special part of hell > is starting to warm up for me after seeing this example... ugh. > Lol! > wait! Maybe I shouldn't be in the extra crispy section quite yet... I > seem to recall this being the way I fixed cyclic dependencies in > structures. Casts to void * are a hairy subject and should be used as > a hack, which I did... this should be revisited. rxnset->data > shouldn't be a void * and I think only was a void * due to cyclical > dependencies for #include's, actually. ie: > > struct a { > struct a *aptr; > struct b *bptr; > }; > > struct b { > struct a *aptr; > struct b *bptr; > }; > > So I worked around it with void*. > Part of me shudders, but another part is impressed. This is exactly the kind of subtlety that I missed through leaving C without ever doing anything real with it. >> However, I don't really get how/why lvalue casts are used with >> pointers, so I'm worried I may have blitzed something important here >> :/ Also, I don't know the ins-and-outs of ruby GC so I'm in the dark >> as to what we can just free and what should be registered with Ruby >> - any good links on this stuff, or pointers to any code I might >> follow from other projects / ruby itself that'd be good to learn it >> from, would be _much_ appreciated. > > Eh, this code is probably not going to help you decipher the mysteries > of mark/sweep GC's since I'm sure I made a mess of things by trying to > avoid the ruby gc where possible. > I think I'm okay with GC concepts in general thanks to my tinkering with Java's JNI, but I was confused by seeing code here manually handling some stuff. I guessed there must be some Ruby-specific rules for what can and can't be managed, but I see now that it's actually optimisation. > > Heh. Yeah, Ruby/C is a bit dense compared to Java, but Ruby/C and its > macros do a ton of work so there's not much need for a lot of the > complexity that you'd run into using Java. > Now I'm starting to get used to it a bit I'm pretty impressed, especially by the way classes and modules are defined. To begin with I was expecting something along the lines of perl with all it's markstack and sv2mortal crap so it's a pleasant surprise and I'm looking forward to figuring it out properly :) Cheers, -- Ross Bamford - rosco at roscopeco.co.uk From rosco at roscopeco.co.uk Thu Dec 29 19:25:13 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Fri, 30 Dec 2005 00:25:13 -0000 Subject: [libxml-devel] Memprof ruby? and GCC 4 In-Reply-To: <6fd0654b0512291426t7619c5d1qd8774103432ecdcc@mail.gmail.com> References: <20051223053553.AC6EA216E04@mailhost.gigave.com> <20051229215558.GI28159@mailhost.gigave.com> <6fd0654b0512291426t7619c5d1qd8774103432ecdcc@mail.gmail.com> Message-ID: On Thu, 29 Dec 2005 22:26:48 -0000, pat eyler wrote: > On 12/29/05, Sean Chittenden wrote: >> > (Apologies for long message) >> >> A long, technical discourse on a -devel list? Never! Say it ain't >> so! *grin* >> >> > >>I've been having 'fun' today scrutinising a lot of GCC literature >> > >>and the like, and trying to find the best way to do some memory >> > >>profiling on this code. This is where I'm at: >> > > >> > >Define memory profiling, as in performance and page hits, or >> > >debugging as in valgrind? >> > >> > ;) The latter. Thx. Now I just need learn to use it. I've run a >> > quick leak check to try it out and am attaching the reports, though >> > they may be useless since I don't know what I'm doing really ;) It's >> > just that 'definitely lost' bit at the bottom that mentions >> > ruby_xml_document... >> >> valgrind and ruby don't get along so well... I haven't tried very >> recently, but Ruby does some tricks that valgrind wasn't able to >> follow and it generated a lot of false positives... not to say >> Valigrind hasn't improved, but there used to be problems with this >> once upon a time. > > Are you using suppressions to avoid the false positives? I've got > a suppression file (it's in need of serious refactoring -- I harvested > poorly named suppressions and haven't bothered renaming them) > that I've attached. I was using it while I worked with zenspider > and the metaruby folks on building tests. You use it like this: > > valgrind --suppressions=supp --tool=memcheck ruby TestFloat.ts > That sounds like something I could use, but I think the attachment got lost? I'd appreciate if you could send it again...? > [deleted] >> > >Getting this to work on gcc4 is important since GCC 4 is much more >> > >strict about what's legal and what's not. Chances are, gcc4's >> > >pointing out some pretty valid bugs/problems. > > I'm starting to think that there's a hidden need here. Between > occasional > oddities between Ruby releases and popular packages (like Rake), and > with the accelerating progress on GCC, maybe it makes sense to start > watching things more closely. > > Andy Lester runs a the Phalanx project, which runsthe testsuites of > popular libraries on pre-release versions of Perl to verify that they'll > work correctly. Does it make sense to try do something similar > with Ruby (and probably Ruby built against against gcc 4.0, 4.1 > pre releases, and 4.2 pre releases)? > I definitely think so. Most of my confusion and legwork so far could have been avoided if such data were available. As it is, I had no way to know what warnings / problems I should be expecting , and what were particular to this project. Most of what I've found has been on (non Ruby) mailing lists, and what Sean has told me - it'd be enormously useful to have the pertinent information in one place. Cheers, -- Ross Bamford - rosco at roscopeco.co.uk From transfire at gmail.com Fri Dec 30 00:04:46 2005 From: transfire at gmail.com (TRANS) Date: Fri, 30 Dec 2005 00:04:46 -0500 Subject: [libxml-devel] Project web page started In-Reply-To: <43B43D5C.6000402@gmail.com> References: <43B43D5C.6000402@gmail.com> Message-ID: <4b6f054f0512292104l7da711d3ne41eb666082b4f18@mail.gmail.com> On 12/29/05, Bob Showalter wrote: > I've created a *very* rudimentary web page at > http://libxml.rubyforge.org. Any suggestions for embellishments or > additional information are more than welcome. Have a to look at the old XML:Tools website. It has a chart of speed comparisions between libxml and rexml which I thought was interesting. T. From rosco at roscopeco.co.uk Fri Dec 30 00:50:51 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Fri, 30 Dec 2005 05:50:51 -0000 Subject: [libxml-devel] That bug again, some ideas Message-ID: Hi, Well, I've been trying for days to replicate that bug. I finally bite the bullet and say 'can't reproduce', then *bam* I reproduce it. Reliably. grr... I've attached the stack trace and a small script that reproduces it every time (here at least). Also, I've been playing with a few ideas for 'more ruby' in the API, just really prototyping things in Ruby to see how it might work. I've attached it as 'libxml-x.rb' - you can either run it for a little demo or require it instead of 'libxml'. Bear in mind though it's just suggestions, and only prototype implementation - a lot of it is pretty naive, I don't have concrete use-cases for much of it, and some stuff (all stuff?) would be better done in C for the performance of the thing. The main thrust really is to bring Enumerable into play, have nodes be comparable based on their name (maybe should be limited to element nodes), and have to_s / to_a behave consistently for nodesets and stuff. -- - Ross Bamford - rosco at roscopeco.co.uk -------------- next part -------------- A non-text attachment was scrubbed... Name: bug_test.rb Type: application/octet-stream Size: 200 bytes Desc: not available Url : http://rubyforge.org/pipermail/libxml-devel/attachments/20051230/778cd254/bug_test.obj -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: bug_trace.txt Url: http://rubyforge.org/pipermail/libxml-devel/attachments/20051230/778cd254/bug_trace.txt -------------- next part -------------- A non-text attachment was scrubbed... Name: libxml-x.rb Type: application/octet-stream Size: 3389 bytes Desc: not available Url : http://rubyforge.org/pipermail/libxml-devel/attachments/20051230/778cd254/libxml-x.obj From rosco at roscopeco.co.uk Fri Dec 30 00:53:15 2005 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Fri, 30 Dec 2005 05:53:15 -0000 Subject: [libxml-devel] Project web page started In-Reply-To: <4b6f054f0512292104l7da711d3ne41eb666082b4f18@mail.gmail.com> References: <43B43D5C.6000402@gmail.com> <4b6f054f0512292104l7da711d3ne41eb666082b4f18@mail.gmail.com> Message-ID: On Fri, 30 Dec 2005 05:04:46 -0000, TRANS wrote: > On 12/29/05, Bob Showalter wrote: >> I've created a *very* rudimentary web page at >> http://libxml.rubyforge.org. Any suggestions for embellishments or >> additional information are more than welcome. > > Have a to look at the old XML:Tools website. It has a chart of speed > comparisions between libxml and rexml which I thought was interesting. > > T. > Oh yes, I'd forgotten about that. That comparison should _definitely_ follow along :) -- Ross Bamford - rosco at roscopeco.co.uk