[libxml-devel] memory consumption when finding inside of large document never goes away

Sean Chittenden sean at chittenden.org
Mon Aug 11 17:38:01 EDT 2008


> I am parsing 120K of XML into a document and then running
>
>   def get_nodes(node, namespace)
>     self.find("./dn:#{node}", "dn:#{namespace}")
>   end
>
>  several times.
>
> Memory usage for my test driver sits at 20 megs if I run get_nodes  
> less than 10 times.  If I run get_nodes 1000 times my memory usage  
> jumps from 20 megs to around 140 megs and does not come back down  
> until the process exits.  If I force a GC.start at the end of each  
> loop I can keep the memory usage down but that is not practical in  
> the real world where I need this code to be at least somewhat fast.
>
> I am only building the document once during the entire duration of  
> the test program so the parsing of the large string should not be a  
> problem.
>
> Any ideas as to why my memory usage grows and then never comes down?

If the memory usage caps off at certain levels but isn't continually  
growing (i.e. a leak), then this is a "problem" with the Ruby GC and  
not with libxml.  libxml just leverages Ruby's GC for memory  
allocation, etc.  See if there is an updated GC patch that you can  
apply.  I don't have the URL handy, but this post makes reference to it:

http://antoniocangiano.com/2007/02/10/top-10-ruby-on-rails-performance-tips/

One could argue, however, that using GC.start is practical if done in  
tight loops.  What exactly are you trying to do with your fragments?   
Maybe there's a more efficient way of getting the result you're  
interested in.

-sc

--
Sean Chittenden
sean at chittenden.org





More information about the libxml-devel mailing list