[libxml-devel] memory consumption when finding inside of large document never goes away
Sean Chittenden
sean at chittenden.org
Mon Aug 11 17:38:01 EDT 2008
> I am parsing 120K of XML into a document and then running
>
> def get_nodes(node, namespace)
> self.find("./dn:#{node}", "dn:#{namespace}")
> end
>
> several times.
>
> Memory usage for my test driver sits at 20 megs if I run get_nodes
> less than 10 times. If I run get_nodes 1000 times my memory usage
> jumps from 20 megs to around 140 megs and does not come back down
> until the process exits. If I force a GC.start at the end of each
> loop I can keep the memory usage down but that is not practical in
> the real world where I need this code to be at least somewhat fast.
>
> I am only building the document once during the entire duration of
> the test program so the parsing of the large string should not be a
> problem.
>
> Any ideas as to why my memory usage grows and then never comes down?
If the memory usage caps off at certain levels but isn't continually
growing (i.e. a leak), then this is a "problem" with the Ruby GC and
not with libxml. libxml just leverages Ruby's GC for memory
allocation, etc. See if there is an updated GC patch that you can
apply. I don't have the URL handy, but this post makes reference to it:
http://antoniocangiano.com/2007/02/10/top-10-ruby-on-rails-performance-tips/
One could argue, however, that using GC.start is practical if done in
tight loops. What exactly are you trying to do with your fragments?
Maybe there's a more efficient way of getting the result you're
interested in.
-sc
--
Sean Chittenden
sean at chittenden.org
More information about the libxml-devel
mailing list