[yt-users] domain-decomposition for volume rendering huge brick?

Matthew Turk matthewturk at gmail.com
Sun Nov 23 12:58:12 PST 2014


Hi Sam,

This looks like a bug!  It may be fixed by replacing np.array with
np.asarray.

-Matt

On Sun Nov 23 2014 at 2:55:33 PM Sam Skillman <samskillman at gmail.com> wrote:

> Hi Stuart,
>
> I just tried to do the same thing on a fake 8GB file and am seeing similar
> things. After killing the job, it exited while doing this:
> ^CTraceback (most recent call last):
>   File "test_load.py", line 11, in <module>
>     ds = yt.load_uniform_grid(data, (1024, 1024, 1024))
>   File
> "/home/skillman/local/yt-x86_64/src/yt-3.0/yt/frontends/stream/data_structures.py",
> line 605, in load_uniform_grid
>     field_units, data = unitify_data(data)
>   File
> "/home/skillman/local/yt-x86_64/src/yt-3.0/yt/frontends/stream/data_structures.py",
> line 494, in unitify_data
>     data = dict((field, np.array(val)) for field, val in data.iteritems())
>   File
> "/home/skillman/local/yt-x86_64/src/yt-3.0/yt/frontends/stream/data_structures.py",
> line 494, in <genexpr>
>     data = dict((field, np.array(val)) for field, val in data.iteritems())
> KeyboardInterrupt
>
> Looking at line 605 in yt/frontends/stream/data_structures.py, it was
> choking when trying to "unitify_data". However, if you go in and change
> line 494 from:
>  data = dict((field, np.array(val)) for field, val in data.iteritems())
> to
>  data = dict((field, val) for field, val in data.iteritems())
>
>
>
> On Fri Nov 21 2014 at 3:07:49 PM Stuart Levy <salevy at illinois.edu> wrote:
>
>>  OK, so I have experimented, though not much.
>>
>> First, sheepishly: I was off by a decimal place in the original file
>> size.   It's a unigrid with 1.5e10, not 1.5e11 voxels - big enough to be a
>> nuisance but not heroic.
>>
>> Second: load_uniform_grid() on a big numpy.memmap()'ed file, even a
>> modest 8GB fraction of the full grid, takes a long time - many tens of
>> minutes?   I ran out of time slice before it finished even doing that.
>> Note this was just calling load_uniform_grid(), not any attempt at
>> calculation yet.
>>
>> Speculation: something sweeps through the memory, causes a page fault,
>> sweeps a bit more, another page fault, etc.  So there'd be many small I/O
>> calls triggered sequentially, wasting lots of time.   Could that be?  If
>> so, then I'm wondering if it could be possible to discover which portions
>> of the array will be in each node's domain, and prefetch those in bulk
>> first, using a few very efficient huge I/O calls (maybe via madvise()).
>>
>> Either that, or if I can do my own domain decomposition up front and
>> *tell* the AMRKDTree which nodes own which slabs of grid, then I could just
>> read() them in - also efficiently - and let yt do any further
>> decomposition, maybe.
>>
>> Does either route make sense?   Is there code I should look at?
>>
>> Thanks as ever
>>
>>
>>     Stuart
>>
>>
>> On 11/7/14 1:33 PM, Sam Skillman wrote:
>>
>> Yep, the volume rendering should build the AMRKDTree itself, and *should*
>> automatically decompose the giant brick into Np pieces. As for memory, you
>> may need to (eek) allow for yt casting to 64-bit floats for the data, but
>> you'll have to just experiment a bit.
>>
>>  Sam
>>
>> On Fri Nov 07 2014 at 11:15:13 AM Stuart Levy <salevy at illinois.edu>
>> wrote:
>>
>>>  Thank you, Sam!   I think this makes sense.   Except, in case (1), do I
>>> need to do something to bring the AMRKDTree into the picture?   Or are you
>>> telling me that it is automatically constructed whenever you
>>> load_uniform_grid(), or volume-render it?
>>>
>>> I think the available nodes have 64GB, so to load the whole ~600GB might
>>> take at least 32 nodes or 1024 cores.
>>>
>>> Will let you know how it goes!
>>>
>>>
>>> On 11/7/14 11:08 AM, Sam Skillman wrote:
>>>
>>> Ack, my calculation of 256-512 cores is probably low... feel free to
>>> push up much higher.
>>>
>>> On Fri Nov 07 2014 at 9:03:51 AM Sam Skillman <samskillman at gmail.com>
>>> wrote:
>>>
>>>> Hi Stuart,
>>>>
>>>>  On Thu Nov 06 2014 at 8:36:28 AM Stuart Levy <salevy at illinois.edu>
>>>> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> We're hoping to use yt parallel volume rendering on a very large
>>>>> generic
>>>>> brick - it's a simple rectangular unigrid slab, but containing
>>>>> something
>>>>> like 1.5e11 points, so much too large for load_uniform_grid() to load
>>>>> into memory in a single machine.
>>>>>
>>>>
>>>>    Are you loading directly using something like numpy.fromfile?  If
>>>> so, I think the easiest method would be to replace that with a np.memmap (
>>>> http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html).
>>>> Once that is loaded, you should be able to use load_uniform_grid.
>>>>
>>>>  At that point, there are two possible routes that both may or may not
>>>> work well.
>>>>
>>>>  1) Just try rendering with ~256-512 cores, and the AMRKDTree should
>>>> try to geometrically split the grid before performing and I/O.
>>>> or
>>>> 2) Use load_uniform_grid with the keyword nprocs=N ( for this size
>>>> simulation, you probably need something like 256-1024 processors depending
>>>> on the memory per core). This should do the equivalent thing to (1), but it
>>>> may hit the I/O here instead of in the kd-tree.
>>>>
>>>>  I *think* (1) should be your best option, but I haven't tried
>>>> rendering this large of a single-grid output.
>>>>
>>>>  When you build the camera option, definitely start out using the
>>>> keyword "no_ghost=True", as this will extrapolate rather than interpolate
>>>> from boundary grids to the vertices. The rendering quality won't be quite
>>>> as good but for unigrid simulations there isn't a tremendous difference.
>>>>
>>>>  Let us know how that goes!  I'd be very excited to see images from
>>>> such a large sim...
>>>>
>>>>  Sam
>>>>
>>>>
>>>>
>>>>>
>>>>> I imagine it wouldn't be hard to do the domain decomposition by hand,
>>>>> loading a different chunk of grid into each MPI process.   But then
>>>>> what?   What would it take to invoke the volume renderer on each piece
>>>>> and composite them together?   Would it help if the chunks were stored
>>>>> in a KDTree?   Is there some example (one of the existing data
>>>>> loaders?)
>>>>> which I could follow?
>>>>> _______________________________________________
>>>>> yt-users mailing list
>>>>> yt-users at lists.spacepope.org
>>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> yt-users mailing listyt-users at lists.spacepope.orghttp://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>>>
>>>  _______________________________________________
>>> yt-users mailing list
>>> yt-users at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>>
>>
>> _______________________________________________
>> yt-users mailing listyt-users at lists.spacepope.orghttp://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>
>>
>>  _______________________________________________
>> yt-users mailing list
>> yt-users at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-users-spacepope.org/attachments/20141123/faa00d45/attachment.htm>


More information about the yt-users mailing list