[yt-users] domain-decomposition for volume rendering huge brick?

Sun Nov 23 12:58:57 PST 2014

Sorry, hit send too early, and google inbox doesn't have "undo"!

So if you change that line, then at least the load_uniform_grid() call for
me takes about 0.02 seconds.

If you could test that out and report back, that would be awesome. Once you
make that change, you'll need to go to the root yt directory, like
path/src/yt-hg probably, then type "python setup.py develop".

I'll be flying for the next few hours, but if that all works, let us know,
and i can submit a pull request to fix it up.

Cheers,
Sam

On Sun Nov 23 2014 at 12:55:25 PM Sam Skillman <samskillman at gmail.com>
wrote:

> Hi Stuart,
>
> I just tried to do the same thing on a fake 8GB file and am seeing similar
> things. After killing the job, it exited while doing this:
> ^CTraceback (most recent call last):
>   File "test_load.py", line 11, in <module>
>     ds = yt.load_uniform_grid(data, (1024, 1024, 1024))
>   File
> "/home/skillman/local/yt-x86_64/src/yt-3.0/yt/frontends/stream/data_structures.py",
> line 605, in load_uniform_grid
>     field_units, data = unitify_data(data)
>   File
> "/home/skillman/local/yt-x86_64/src/yt-3.0/yt/frontends/stream/data_structures.py",
> line 494, in unitify_data
>     data = dict((field, np.array(val)) for field, val in data.iteritems())
>   File
> "/home/skillman/local/yt-x86_64/src/yt-3.0/yt/frontends/stream/data_structures.py",
> line 494, in <genexpr>
>     data = dict((field, np.array(val)) for field, val in data.iteritems())
> KeyboardInterrupt
>
> Looking at line 605 in yt/frontends/stream/data_structures.py, it was
> choking when trying to "unitify_data". However, if you go in and change
> line 494 from:
>  data = dict((field, np.array(val)) for field, val in data.iteritems())
> to
>  data = dict((field, val) for field, val in data.iteritems())
>
>
>
> On Fri Nov 21 2014 at 3:07:49 PM Stuart Levy <salevy at illinois.edu> wrote:
>
>>  OK, so I have experimented, though not much.
>>
>> First, sheepishly: I was off by a decimal place in the original file
>> size.   It's a unigrid with 1.5e10, not 1.5e11 voxels - big enough to be a
>> nuisance but not heroic.
>>
>> Second: load_uniform_grid() on a big numpy.memmap()'ed file, even a
>> modest 8GB fraction of the full grid, takes a long time - many tens of
>> minutes?   I ran out of time slice before it finished even doing that.
>> Note this was just calling load_uniform_grid(), not any attempt at
>> calculation yet.
>>
>> Speculation: something sweeps through the memory, causes a page fault,
>> sweeps a bit more, another page fault, etc.  So there'd be many small I/O
>> calls triggered sequentially, wasting lots of time.   Could that be?  If
>> so, then I'm wondering if it could be possible to discover which portions
>> of the array will be in each node's domain, and prefetch those in bulk
>> first, using a few very efficient huge I/O calls (maybe via madvise()).
>>
>> Either that, or if I can do my own domain decomposition up front and
>> *tell* the AMRKDTree which nodes own which slabs of grid, then I could just
>> read() them in - also efficiently - and let yt do any further
>> decomposition, maybe.
>>
>> Does either route make sense?   Is there code I should look at?
>>
>> Thanks as ever
>>
>>
>>     Stuart
>>
>>
>> On 11/7/14 1:33 PM, Sam Skillman wrote:
>>
>> Yep, the volume rendering should build the AMRKDTree itself, and *should*
>> automatically decompose the giant brick into Np pieces. As for memory, you
>> may need to (eek) allow for yt casting to 64-bit floats for the data, but
>> you'll have to just experiment a bit.
>>
>>  Sam
>>
>> On Fri Nov 07 2014 at 11:15:13 AM Stuart Levy <salevy at illinois.edu>
>> wrote:
>>
>>>  Thank you, Sam!   I think this makes sense.   Except, in case (1), do I
>>> need to do something to bring the AMRKDTree into the picture?   Or are you
>>> telling me that it is automatically constructed whenever you
>>> load_uniform_grid(), or volume-render it?
>>>
>>> I think the available nodes have 64GB, so to load the whole ~600GB might
>>> take at least 32 nodes or 1024 cores.
>>>
>>> Will let you know how it goes!
>>>
>>>
>>> On 11/7/14 11:08 AM, Sam Skillman wrote:
>>>
>>> Ack, my calculation of 256-512 cores is probably low... feel free to
>>> push up much higher.
>>>
>>> On Fri Nov 07 2014 at 9:03:51 AM Sam Skillman <samskillman at gmail.com>
>>> wrote:
>>>
>>>> Hi Stuart,
>>>>
>>>>  On Thu Nov 06 2014 at 8:36:28 AM Stuart Levy <salevy at illinois.edu>
>>>> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> We're hoping to use yt parallel volume rendering on a very large
>>>>> generic
>>>>> brick - it's a simple rectangular unigrid slab, but containing
>>>>> something
>>>>> like 1.5e11 points, so much too large for load_uniform_grid() to load
>>>>> into memory in a single machine.
>>>>>
>>>>
>>>>    Are you loading directly using something like numpy.fromfile?  If
>>>> so, I think the easiest method would be to replace that with a np.memmap (
>>>> http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html).
>>>> Once that is loaded, you should be able to use load_uniform_grid.
>>>>
>>>>  At that point, there are two possible routes that both may or may not
>>>> work well.
>>>>
>>>>  1) Just try rendering with ~256-512 cores, and the AMRKDTree should
>>>> try to geometrically split the grid before performing and I/O.
>>>> or
>>>> 2) Use load_uniform_grid with the keyword nprocs=N ( for this size
>>>> simulation, you probably need something like 256-1024 processors depending
>>>> on the memory per core). This should do the equivalent thing to (1), but it
>>>> may hit the I/O here instead of in the kd-tree.
>>>>
>>>>  I *think* (1) should be your best option, but I haven't tried
>>>> rendering this large of a single-grid output.
>>>>
>>>>  When you build the camera option, definitely start out using the
>>>> keyword "no_ghost=True", as this will extrapolate rather than interpolate
>>>> from boundary grids to the vertices. The rendering quality won't be quite
>>>> as good but for unigrid simulations there isn't a tremendous difference.
>>>>
>>>>  Let us know how that goes!  I'd be very excited to see images from
>>>> such a large sim...
>>>>
>>>>  Sam
>>>>
>>>>
>>>>
>>>>>
>>>>> I imagine it wouldn't be hard to do the domain decomposition by hand,
>>>>> loading a different chunk of grid into each MPI process.   But then
>>>>> what?   What would it take to invoke the volume renderer on each piece
>>>>> and composite them together?   Would it help if the chunks were stored
>>>>> in a KDTree?   Is there some example (one of the existing data
>>>>> loaders?)
>>>>> which I could follow?
>>>>> _______________________________________________
>>>>> yt-users mailing list
>>>>> yt-users at lists.spacepope.org
>>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> yt-users mailing listyt-users at lists.spacepope.orghttp://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>>>
>>>  _______________________________________________
>>> yt-users mailing list
>>> yt-users at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>>
>>
>> _______________________________________________
>> yt-users mailing listyt-users at lists.spacepope.orghttp://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>
>>
>>  _______________________________________________
>> yt-users mailing list
>> yt-users at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-users-spacepope.org/attachments/20141123/d24de46a/attachment.htm>