[yt-users] domain-decomposition for volume rendering huge brick?

Tue Dec 2 14:11:34 PST 2014

Sam: yes, that fix works great!

On 11/23/14 2:58 PM, Sam Skillman wrote:
> Sorry, hit send too early, and google inbox doesn't have "undo"!
>
> So if you change that line, then at least the load_uniform_grid() call 
> for me takes about 0.02 seconds.
>
> If you could test that out and report back, that would be awesome. 
> Once you make that change, you'll need to go to the root yt directory, 
> like path/src/yt-hg probably, then type "python setup.py develop".
>
> I'll be flying for the next few hours, but if that all works, let us 
> know, and i can submit a pull request to fix it up.
>
> Cheers,
> Sam
>
> On Sun Nov 23 2014 at 12:55:25 PM Sam Skillman <samskillman at gmail.com 
> <mailto:samskillman at gmail.com>> wrote:
>
>     Hi Stuart,
>
>     I just tried to do the same thing on a fake 8GB file and am seeing
>     similar things. After killing the job, it exited while doing this:
>     ^CTraceback (most recent call last):
>       File "test_load.py", line 11, in <module>
>         ds = yt.load_uniform_grid(data, (1024, 1024, 1024))
>       File
>     "/home/skillman/local/yt-x86_64/src/yt-3.0/yt/frontends/stream/data_structures.py",
>     line 605, in load_uniform_grid
>         field_units, data = unitify_data(data)
>       File
>     "/home/skillman/local/yt-x86_64/src/yt-3.0/yt/frontends/stream/data_structures.py",
>     line 494, in unitify_data
>         data = dict((field, np.array(val)) for field, val in
>     data.iteritems())
>       File
>     "/home/skillman/local/yt-x86_64/src/yt-3.0/yt/frontends/stream/data_structures.py",
>     line 494, in <genexpr>
>         data = dict((field, np.array(val)) for field, val in
>     data.iteritems())
>     KeyboardInterrupt
>
>     Looking at line 605 in yt/frontends/stream/data_structures.py, it
>     was choking when trying to "unitify_data". However, if you go in
>     and change line 494 from:
>      data = dict((field, np.array(val)) for field, val in
>     data.iteritems())
>     to
>      data = dict((field, val) for field, val in data.iteritems())
>
>
>
>     On Fri Nov 21 2014 at 3:07:49 PM Stuart Levy <salevy at illinois.edu
>     <mailto:salevy at illinois.edu>> wrote:
>
>         OK, so I have experimented, though not much.
>
>         First, sheepishly: I was off by a decimal place in the
>         original file size.   It's a unigrid with 1.5e10, not 1.5e11
>         voxels - big enough to be a nuisance but not heroic.
>
>         Second: load_uniform_grid() on a big numpy.memmap()'ed file,
>         even a modest 8GB fraction of the full grid, takes a long time
>         - many tens of minutes?   I ran out of time slice before it
>         finished even doing that.   Note this was just calling
>         load_uniform_grid(), not any attempt at calculation yet.
>
>         Speculation: something sweeps through the memory, causes a
>         page fault, sweeps a bit more, another page fault, etc.  So
>         there'd be many small I/O calls triggered sequentially,
>         wasting lots of time.   Could that be?  If so, then I'm
>         wondering if it could be possible to discover which portions
>         of the array will be in each node's domain, and prefetch those
>         in bulk first, using a few very efficient huge I/O calls
>         (maybe via madvise()).
>
>         Either that, or if I can do my own domain decomposition up
>         front and *tell* the AMRKDTree which nodes own which slabs of
>         grid, then I could just read() them in - also efficiently -
>         and let yt do any further decomposition, maybe.
>
>         Does either route make sense?   Is there code I should look at?
>
>         Thanks as ever
>
>
>             Stuart
>
>
>         On 11/7/14 1:33 PM, Sam Skillman wrote:
>>         Yep, the volume rendering should build the AMRKDTree itself,
>>         and *should* automatically decompose the giant brick into Np
>>         pieces. As for memory, you may need to (eek) allow for yt
>>         casting to 64-bit floats for the data, but you'll have to
>>         just experiment a bit.
>>
>>         Sam
>>
>>         On Fri Nov 07 2014 at 11:15:13 AM Stuart Levy
>>         <salevy at illinois.edu <mailto:salevy at illinois.edu>> wrote:
>>
>>             Thank you, Sam!   I think this makes sense.   Except, in
>>             case (1), do I need to do something to bring the
>>             AMRKDTree into the picture?   Or are you telling me that
>>             it is automatically constructed whenever you
>>             load_uniform_grid(), or volume-render it?
>>
>>             I think the available nodes have 64GB, so to load the
>>             whole ~600GB might take at least 32 nodes or 1024 cores.
>>
>>             Will let you know how it goes!
>>
>>
>>             On 11/7/14 11:08 AM, Sam Skillman wrote:
>>>             Ack, my calculation of 256-512 cores is probably low...
>>>             feel free to push up much higher.
>>>
>>>             On Fri Nov 07 2014 at 9:03:51 AM Sam Skillman
>>>             <samskillman at gmail.com <mailto:samskillman at gmail.com>>
>>>             wrote:
>>>
>>>                 Hi Stuart,
>>>
>>>                 On Thu Nov 06 2014 at 8:36:28 AM Stuart Levy
>>>                 <salevy at illinois.edu <mailto:salevy at illinois.edu>>
>>>                 wrote:
>>>
>>>                     Hello all,
>>>
>>>                     We're hoping to use yt parallel volume rendering
>>>                     on a very large generic
>>>                     brick - it's a simple rectangular unigrid slab,
>>>                     but containing something
>>>                     like 1.5e11 points, so much too large for
>>>                     load_uniform_grid() to load
>>>                     into memory in a single machine.
>>>
>>>
>>>                 Are you loading directly using something like
>>>                 numpy.fromfile?  If so, I think the easiest method
>>>                 would be to replace that with a np.memmap
>>>                 (http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html).
>>>                 Once that is loaded, you should be able to use
>>>                 load_uniform_grid.
>>>
>>>                 At that point, there are two possible routes that
>>>                 both may or may not work well.
>>>
>>>                 1) Just try rendering with ~256-512 cores, and the
>>>                 AMRKDTree should try to geometrically split the grid
>>>                 before performing and I/O.
>>>                 or
>>>                 2) Use load_uniform_grid with the keyword nprocs=N (
>>>                 for this size simulation, you probably need
>>>                 something like 256-1024 processors depending on the
>>>                 memory per core). This should do the equivalent
>>>                 thing to (1), but it may hit the I/O here instead of
>>>                 in the kd-tree.
>>>
>>>                 I *think* (1) should be your best option, but I
>>>                 haven't tried rendering this large of a single-grid
>>>                 output.
>>>
>>>                 When you build the camera option, definitely start
>>>                 out using the keyword "no_ghost=True", as this will
>>>                 extrapolate rather than interpolate from boundary
>>>                 grids to the vertices. The rendering quality won't
>>>                 be quite as good but for unigrid simulations there
>>>                 isn't a tremendous difference.
>>>
>>>                 Let us know how that goes!  I'd be very excited to
>>>                 see images from such a large sim...
>>>
>>>                 Sam
>>>
>>>
>>>                     I imagine it wouldn't be hard to do the domain
>>>                     decomposition by hand,
>>>                     loading a different chunk of grid into each MPI
>>>                     process.   But then
>>>                     what?   What would it take to invoke the volume
>>>                     renderer on each piece
>>>                     and composite them together?   Would it help if
>>>                     the chunks were stored
>>>                     in a KDTree?   Is there some example (one of the
>>>                     existing data loaders?)
>>>                     which I could follow?
>>>                     _______________________________________________
>>>                     yt-users mailing list
>>>                     yt-users at lists.spacepope.org
>>>                     <mailto:yt-users at lists.spacepope.org>
>>>                     http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>>>
>>>
>>>             _______________________________________________
>>>             yt-users mailing list
>>>             yt-users at lists.spacepope.org  <mailto:yt-users at lists.spacepope.org>
>>>             http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>
>>             _______________________________________________
>>             yt-users mailing list
>>             yt-users at lists.spacepope.org
>>             <mailto:yt-users at lists.spacepope.org>
>>             http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>
>>
>>
>>         _______________________________________________
>>         yt-users mailing list
>>         yt-users at lists.spacepope.org  <mailto:yt-users at lists.spacepope.org>
>>         http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>
>         _______________________________________________
>         yt-users mailing list
>         yt-users at lists.spacepope.org <mailto:yt-users at lists.spacepope.org>
>         http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>
>
>
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-users-spacepope.org/attachments/20141202/11c037c3/attachment-0001.htm>