[Yt-dev] 1024^3 HOP problems

Stephen Skory stephenskory at yahoo.com
Wed May 6 19:47:02 PDT 2009


> There are several debugging techniques that need to be executed.  I
> would recommend you instantiate the hierarchy interactively and
> examine the RAM in use. 

I did this on the login node.

>>> pf = load('DD0082')
>>> pf.h
....
>>> h.heap()
Partition of a set of 9002467 objects. Total size = 1204579016 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 441317   5 462500216  38 462500216  38 dict of yt.lagos.HierarchyType.EnzoGrid
     1 883469  10 248309432  21 710809648  59 dict (no owner)
     2 2206122  25 176489760  15 887299408  74 numpy.ndarray
     3 884892  10 112789472   9 1000088880  83 list
     4 515901   6 46752208   4 1046841088  87 str
     5 1767567  20 42421608   4 1089262696  90 numpy.float64
     6 441319   5 38836072   3 1128098768  94 __builtin__.weakproxy
     7 441317   5 31774824   3 1159873592  96 yt.lagos.HierarchyType.EnzoGrid
     8 444668   5 10672032   1 1170545624  97 int
     9 441319   5 10591656   1 1181137280  98 numpy.int32

1.2GB. Which is a fair amount to heft around per thread. I've done runs on Ranger and Kraken with up to 4GB per thread, which should be sufficient for the data I think.

> Load a single tile with varying sizes based
> on the number of processors, and see how many fields you can load
> before it dies.

I'm not exactly sure what you mean by this. I however have been trying this script:

http://paste.enzotools.org/show/121/

and it dies if RunHOP is turned on, runs fine to completion if I comment it out. Some of the threads run RunHOP before the thing dies. Here are the error messages I get with RunHOP on:

http://paste.enzotools.org/show/122/

But those error messages aren't anything like I've seen when doing a real run of HOP. However, the error messages in those runs have been so cryptic and inconsistent I don't feel like I can say any of these errors are the same thing. I can say that I ran the script above twice and got the exact same error messages, which is better than with the regular HOP run.

I just ran the script above on a very small dataset and it didn't crash, so I don't think there's anything inherently wrong with the script.

 _______________________________________________________
sskory at physics.ucsd.edu           o__  Stephen Skory
http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student
________________________________(_)_\(_)_______________



More information about the yt-dev mailing list