[yt-dev] Memory leak issue with ParallelHOP code

Thu Nov 17 09:53:01 PST 2011

Hello all, including Pragnesh, hope your request went through.

I have since finished the troubling 3200 cube dataset, so good news on my
part, but now I'm trying to track down what made it so difficult.

Let me summarize the problem I had, Pragnesh was continuing a discussion we
had off list:

I was analyzing the 3200 cube, and due to peak memory constraints of
Natuilus, I tried to analyze subvolumes 1/64 of the whole volume one at a
time.  The behavior we found was that if we do not end the python script
after each subvolume is analyzed with parallelHF() call, and start
analyzing the subsequent 1/64 of the subvolume, the peak memory will go up,
as per the print statements.  If we were to restart the script after each
subvolume is analyzed, we can get the peak memory back to a level
consistent with the first subvolume.

Attached is the memory output plotted of parallelHF with 256 cores, so
every 256 (in the x axis) it looks like a step function.  Each step
correspond to the peak memory used by parallelHF on a subvolume, and the
numbers just go up, and if we were to end the calculation in the middle and
restart at a particular subvolume, the numbers would drop back down.  Since
the numbers from each core varied, we were wondering if this is because the
particle data remained on the core even after the calculation.

hg tip gave:
changeset:   4676:edd92d1eadf8
branch:      yt
tag:         tip
parent:      4671:85bc12462dff
parent:      4675:5efc2462a321
user:        Stephen Skory <s at skory.us>
date:        Wed Oct 19 13:44:33 2011 -0600
summary:     Merging from mainline.

This is the mod Stephen made to YT to help with this problem about a month
ago.  It definitely helped with the memory usage, but I just wanted to
narrow this problem down to something more tangible besides "memory leak"
inside YT.  Hopefully others who ran into the same problem or are experts
at parallelHF can help us out.

These issues were found before the new optional KDtree was put in by
Stephen, so that is definitely something we can try.  If the problem goes
away, then we can probably say that the memory issue is inside the Fortran
KDtree.  I might be the only one encountering this problem, which only
occurs when using YT in unconventional way (I believe the subvolume feature
was made for analyzing a single halo, not the whole volume in pieces), so I
was wondering if anyone else has come across this phenomenon, maybe running
parallelHF on several datasets and seeing a peak memory increase?

From
G.S.

On Thu, Nov 17, 2011 at 9:49 AM, Patel, Pragneshkumar B <pragnesh at utk.edu>wrote:

>
>  ------------------------------
> *From:* Patel, Pragneshkumar B
> *Sent:* Thursday, November 17, 2011 12:18 PM
> *To:* Britton Smith
> *Cc:* Geoffrey So; s at skory.us
> *Subject:* RE: [yt-dev] Memory leak issue with ParallelHOP code
>
>   Hi Britton,
>
> I have subscribed to both yt-users and yt-dev. It may possible my requests
> are pending. As far as dataset is concern, Geoffrey can give you better
> explanation about it.
>
> Dataset contains 0-64 regions. I tried to run ParallelHOP with all regions
> but because of memory issue, I could not able to complete all in once. Then
> I ran it with only 0-3 regions to debug code "parallel_hop_interface.py".
>
> Please also find attached script. Let me know, if you have more
> questions.
>
>
> Thanks
> Pragnesh
>
>
>
>  ------------------------------
> *From:* Britton Smith [brittonsmith at gmail.com]
> *Sent:* Thursday, November 17, 2011 12:00 PM
> *To:* yt-dev at lists.spacepope.org
> *Cc:* Patel, Pragneshkumar B
> *Subject:* Re: [yt-dev] Memory leak issue with ParallelHOP code
>
>  Hi again Pragnesh,
>
> I mistakenly asked you to post this to the enzo-users mailing when I meant
> to say that you should post this to the yt-users mailing list.  Sorry about
> that, not enough coffee this morning.
>
> Britton
>
> On Thu, Nov 17, 2011 at 8:56 AM, Britton Smith <brittonsmith at gmail.com>wrote:
>
>> Hi Pragnesh,
>>
>> You need to provide us with some more information.  What do you mean when
>> you say you ran for 0-3 regions?  What kind of dataset are you trying to
>> analyze?  How big is it?  Can you also please post the script you are using
>> for this?
>>
>> Finally, could you please sign up for the enzo-users mailing list and
>> post this there?  You've posted this to the development list, which isn't
>> the best forum for this question.  Also, if you're not on the list you
>> won't receive any of the responses to your post.
>>
>> Britton
>>
>>  On Thu, Nov 17, 2011 at 7:29 AM, Patel, Pragneshkumar B <
>> pragnesh at utk.edu> wrote:
>>
>>>   Hi Stephen,
>>>
>>> I found memory leak issue, when I ran Parallel_HOP code. It may not be
>>> in python itself but may be in packages(e.g. kdtree etc..), which we call
>>> from"parallel_hop_interface.py". Please find attach file, in which you see
>>> memory keep increase in every region (I ran program with 32 processes for
>>> 0-3 regions.)
>>>
>>> I like to get your comments/suggestions to solve this issue.
>>>
>>> I am trying to find more about it using Valgrind. I will let you know.
>>>
>>> Thanks
>>> Pragnesh
>>>
>>>
>>>
>>>  _______________________________________________
>>> yt-dev mailing list
>>> yt-dev at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>
>>>
>>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20111117/5d7c799c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mem.png
Type: image/png
Size: 42795 bytes
Desc: not available
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20111117/5d7c799c/attachment-0001.png>