[libxml-devel] memory consumption when finding inside of large document never goes away

Charlie Savage cfis at savagexi.com
Sat Aug 16 18:33:58 EDT 2008



Matthew Margolis wrote:
> I don't mind setting nodes = nil before calling GC.start (read some 
> other threads so I think I understand why I have to do that) but I do 
> mind the speed hit, so if you think there is a way around that I would 
> love to know more. 

It would all be in the C code.  Something like:

xml_document:
   * for each xpath_object returned, store a pointer to it (st_hash, key 
is a pointer, value is the object)
   * Update api to include register_xpath_object, unregister_xpath_object
   * When freed, iterate over xpath objects, and any left tell them to 
free their underlying C object (not the ruby object)

xpath_object
   * On creation, call document.register_xpath_object
   * On freeding, call document.unregister_xpath_object
   * Add api, called by document, called free_xpath_object (or some such)


> 
> My general calling pattern is 
> 1.  Document#find_first to get the most top level element I am 
> interested in 
> 2.  top_level_element#find for each of its direct children.  When I find 
> each child I then recurse down and load that children's children.

No need to call find for that.  Just iterate over the children directly 
- will be faster.  node.children I think it is....

> So yes I am walking the entire tree which will create a bunch of 
> objects.  When only grabbing the top level element in my test program I 
> am still seeing a big spike in memory.  I looked at the XPath Object 
> code and it looks to me like this case is the one I am going to match 
> when trying to find the topmost element of interest.
> 
>   case XPATH_NODESET:
>     rval = Data_Wrap_Struct(cXMLXPathObject,
>                             ruby_xml_xpath_object_mark,
>                             ruby_xml_xpath_object_free,
>                             xpop);
> 
> I am not familiar with Data_Wrap_Struct(part of Ruby?) so I don't know 
> if it could potentially create lots of objects.

Yes, this wraps just the return object.  The xpop object looks like this:

http://xmlsoft.org/XSLT/object.gif

Its iterating over xpop->nodesetval I was mentioning.

> 
> I will look at the XMLReader tests to try to get a better feel for if it 
> will meet my needs.  Thank you for the suggestion.

Sure - sounds like that will be your best bet.  Seems perfect for what 
you need.

Charlie
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3237 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://rubyforge.org/pipermail/libxml-devel/attachments/20080816/6d6064ad/attachment.bin>


More information about the libxml-devel mailing list