[yt-users] seg fault in Quadtree

Matthew Turk matthewturk at gmail.com
Thu Feb 27 11:55:44 PST 2014


Hi Britton,

On Thu, Feb 27, 2014 at 12:24 PM, Britton Smith <brittonsmith at gmail.com> wrote:
> Hi Nathan, Matt,
>
> I'm working on getting some more debugging information with your
> suggestions.  So far, I've been able to track it to the loop inside
> QuadTree.initialize_chunk.  This appears to loop over all the points within
> the chunk, calling add_to_position for each point.  The loop looks like
> this:
>
>         for p in range(num):
>             pos[0] = pxs[p]
>             pos[1] = pys[p]
>             self.add_to_position(level[p], pos, NULL, 0.0, 1)
>
> If I print the values of p, level[p], pos[0], pos[1] inside this loop, I see
> the following (with a few extra lines leading up):
>
> 1893689 22 1148578047 1106259970
> 1893690 22 1148578047 1106259971
> 1893691 22 1148578047 1106259972
> 1893692 22 1148578047 1106259973
> 1893693 22 1148578047 1106259979
> 1893694 23 -1997811214 -2082447348
>
> So, somehow, starting on level 23, the x and y positions are messed up in
> some way.  Is this a precision issue?  How are these positions calculated?

Yeah, this looks like the problem.  These positions are computed by doing:

cell_integer_index + grid.get_global_index()

If you're on 3.0, this is all done implicitly inside icoords.  One way
to avoid the segfault and determine the specific place it fails is to
go into grid_object.py and inside icoords, assert that the values are
all positive -- no matter the domain, this is always the case.  If
you're using octrees, this would go into octree_subset.py.

It may be that you're on a machine where the default int is 32 rather
than 64, and there is a careless assumption of this somewhere.  If you
try on a different machine it might work.  That would help track all
of this down.

-Matt

>
> Britton
>
>
> On Thu, Feb 27, 2014 at 5:54 PM, Nathan Goldbaum <nathan12343 at gmail.com>
> wrote:
>>
>> Hi Britton,
>>
>> On my machine it will tell me line numbers in the .C file if a crash
>> happens inside a .so file, even if it's called from python.  I'm not sure
>> how to get that information on your system without knowing more about your
>> setup.
>>
>> PDB doesn't know about C extensions so that won't be helpful
>> unfortunately.
>>
>> If you're running serially you should be able to run python under gdb and
>> get a traceback that way.  I'm not sure how to do that for parallel runs.
>>
>> This page might be helpful:
>> http://docs.cython.org/src/userguide/debugging.html
>>
>> Nathan
>>
>> On Thursday, February 27, 2014, Britton Smith <brittonsmith at gmail.com>
>> wrote:
>>>
>>> Hi Nathan,
>>>
>>> I'm having a hard time getting a traceback that goes into the QuadTree
>>> source.  The seg fault I get stops at QuadTree.so.  Is there a way to
>>> recompile this in debug mode to get some more information?  It doesn't look
>>> like pdb is able to step into QuadTree either.
>>>
>>> Britton
>>>
>>>
>>> On Thu, Feb 27, 2014 at 5:22 PM, Nathan Goldbaum <nathan12343 at gmail.com>
>>> wrote:
>>>>
>>>> Hi Britton,
>>>>
>>>> Can you get a traceback from the seg fault?  It would help to see the
>>>> line number in the autogenerated QuadTree.c where the crash happens.
>>>> Autogenerated C files produced by cython reproduce the original .pyx files
>>>> line by line as comments so it's usually pretty easy to back out where the
>>>> crash is happening in the original Pyrex file.
>>>>
>>>> Nathan
>>>>
>>>>
>>>> On Thursday, February 27, 2014, Britton Smith <brittonsmith at gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I'm trying to make projections of a rather large Enzo dataset and
>>>>> getting a segfault somewhere in Quadtree.so.  This dataset is ~230 GB in
>>>>> size with 27 levels of AMR.  As far as I can tell, the only hard coded limit
>>>>> I could find in QuadTree.pyx is for 80 levels, which I am clearly below.
>>>>> Does anyone familiar with this part of the code have any idea if there are
>>>>> any other hard-coded limits in here that I might be exceeding?  If not, does
>>>>> anyone have any advice for how I might debug this?  I'm seeing this behavior
>>>>> in both yt-2.x and yt-3.0, so it does seem to be something intrinsic to the
>>>>> quadtree code.
>>>>>
>>>>> Thanks!
>>>>> Britton
>>>>
>>>>
>>>> _______________________________________________
>>>> yt-users mailing list
>>>> yt-users at lists.spacepope.org
>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>
>>>
>>
>> _______________________________________________
>> yt-users mailing list
>> yt-users at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>
>
>
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>



More information about the yt-users mailing list