[yt-dev] Cut Region Failure

Thu Oct 6 11:38:03 PDT 2016

Hi Nathan,

You're right, the artifacts I saw were related to pixelization. When I do
as you suggest:

ad = ds.all_data()
my_cut_region = ad.cut_region('(obj["temperature"] > 1e4) &
(obj["temperature"] < 1e6)')
slice = ds.slice(axis, coort, data_source=my_cut_region)
temp = slice['temperature']
print (temp.min(), temp.max())

Here's the result:
(10002.1923828 K, 999992.375 K)

So, the temperature values are all within the cut_region. Thanks for the
suggestion!

As for the morton indices, I was surprised that they were all identical as
well. After playing around with it, I discovered that I there's a
difference in how straight Python handles types versus Cython: in
particular, the code in my Python function pos_to_morton() used to restrict
the input number to a valid range resulted in overflows consistently (hence
the identical values). The line in question is:

ii[i] = np.min([(1 << ORDER_MAX)-1, np.max([0, ii[i]])])

I used this line instead of i64clip() from fp_utils.pyx. If I use this:

ii[i] &= np.uint64((1 << ORDER_MAX) - 1)

I get unique morton indices for the entire domain with the yt algorithm.
So, the implementation in yt is probably fine, assuming that Cython handles
the variable types appropriately in i64clip(). Although, It might make
sense to move to Baert's algorithm in future for super high resolution
simulations, since he uses 21 bits, vs. 20 bits (ORDER_MAX in
yt/utilities/lib/geometry_lib.pyx).

My apologies for the wild goose chase(s), and thanks so much for the help!

Cheers,
Jason

On Thu, Oct 6, 2016 at 11:26 AM, Nathan Goldbaum <nathan12343 at gmail.com>
wrote:

> Hi Jason,
>
> I'm having some trouble digesting your last message. Unfortunately it's
> hard for me to say exactly what's happening because I can't run the tests
> you've described locally. It would help me to discuss the issues you're
> having if you can share a test script that I can run locally. You can share
> it via the yt pastebin service to share it.
>
> If you can't share the dataset you're using, you can use one of the
> publicly available test datasets on yt-project.org/data.
>
> If you can share the dataset you're using, you might find the yt curldrop
> to be useful for sharing it: https://docs.hub.yt/services.html#curldrop
>
> Just to speculate, I bet the artifacts you're seeing are due to the
> pixelization rather than the data selection. One way to verify would be to
> create a slice data object that uses the cut_region you've defined as a
> data source and then query the temperature field and get the extrema of
> that data. Something like:
>
> slice = ds.slice(axis, coort, data_source=my_cut_region)
> temp = slice['temperature']
> print (temp.min(), temp.max())
>
> If it is due to the pixelization, it might be tricky to fix, but you
> should be able to at least set the colorbar range in your plots manually so
> the issue is less visible.
>
> I'm surprised that the morton indices you're calculating are all identical
> given the fact that the CutRegionSelector seems to be mostly working, since
> the CutRegionSelector depends on all cells having unique Morton indices.
> Keep in mind that position_to_morton was not written with the intention of
> being part of the public API, so it's not really meant to be used in the
> way you're using it, I think.
>
> -Nathan
>
> On Thu, Oct 6, 2016 at 10:12 AM, Jason Galyardt <jason.galyardt at gmail.com>
> wrote:
>
>> Hi Nathan,
>>
>> The solution you came up with for issue 941
>> <https://bitbucket.org/yt_analysis/yt/issues/941/cutregionselector-doesnt-work-with-some>
>> that uses the morton indices + a tolerance factor seems to mostly work for
>> my FLASH simulations for both single conditionals, as well as the union of
>> 2 conditionals. The attached slice plot in temperature was created from one
>> of my FLASH simulations with the following conditionals: 1e4 K < T < 1e6 K.
>> It looks like there may be some small residual issues: there are a few
>> cells here and there that seem to have T ~= 10**-9 K, which is odd, since T
>> > 1e4 K was one of the conditionals.
>>
>> In a related but separate issue, I came across a problem with the
>> position_to_morton() function in geometry_utils.pyx. I looked at the morton
>> indices for my FLASH simulation: I get identical values for all cells. I
>> think this is basically a loss of precision, though I've not investigated
>> fully. Since the position_to_morton() function uses magic numbers from a StackOverflow
>> post
>> <http://stackoverflow.com/questions/1024754/how-to-compute-a-3d-morton-number-interleave-the-bits-of-3-ints>,
>> I thought I'd search for an alternative. I found one on Jeroen Baert's
>> blog
>> <http://www.forceflow.be/2013/10/07/morton-encodingdecoding-through-bit-interleaving-implementations/>.
>> This magic bit implementation spreads the bits out differently and
>> restricts the input to the first 21 bits in each dimension (so the morton
>> index for each cell is a 63 bit number). I wrote a very pedestrian python
>> script to test both of these implementations against my FLASH simulation:
>>
>> http://paste.yt-project.org/show/6844/
>>
>> I used two different values of the cell width (delta below and in the
>> script above):  one uses the same width as is used in
>> yt/utilities/lib/geometry_utils.pyx:
>>
>> delta[i] = (dle[i] - dre[i]) / (1 << ORDER_MAX)
>>
>> where dle is the domain left edge array, and dre is the domain right edge
>> array, and ORDER_MAX = 20. The other cell width is the minimum cell width
>> in each dimension. I found that using the yt implementation of cell width
>> (the ORDER_MAX one), all the morton indices are identical for the yt
>> implementation of position_to_morton() used in yt. Whereas, the yt
>> implementation of position_to_morton() works fine with the cell width set
>> to the minimum found in each dimension (i.e. all morton indices are unique).
>>
>> Jeroen Baert's morton index generation algorithm generates unique morton
>> indices for both implementations of the cell width array. The output of
>> this script when run on my FLASH simulations is below.
>>
>> Using Jeroen Baert's morton index generation algorithm doesn't seem to
>> affect the issue of the cut_region artifacts, as far as I can tell. That is
>> to say, the artifacts are still there when you don't include the tolerance
>> factor code. I've tested this by inserting Jeroen Baert's algorithm into
>> geometry_utils.pyx from Nathan's latest changeset (5cf30893afdf),
>> rebuilding, and then generating the same slice plot as attached. The plots
>> are identical to the eye (down to the small regions of T ~ 10**-9 K).
>>
>> So, I'm not sure what this all means because I don't have a clear picture
>> of how the morton indices are used in yt. I do find it odd that the hash
>> used in the cut_region() code still works when all the morton indices are
>> identical though. This is probably not terribly helpful; any ideas on how
>> we can shed more light on either of these issues? Anything else I can do?
>>
>>
>> Cheers,
>> Jason
>>
>> ------
>> Jason Galyardt
>> University of Georgia
>>
>>
>> Here's the output of the script, as promised:
>> ----------------------------------------------------------
>>
>> yt : [INFO     ] 2016-10-06 10:05:05,130 Particle file found:
>> smithCloud_hdf5_part_0115
>> yt : [INFO     ] 2016-10-06 10:05:05,137 integer runtime parameter
>> checkpointfilenumber overwrites a simulation scalar of the same name
>> yt : [INFO     ] 2016-10-06 10:05:05,137 integer runtime parameter
>> particlefilenumber overwrites a simulation scalar of the same name
>> yt : [INFO     ] 2016-10-06 10:05:05,138 integer runtime parameter
>> plotfilenumber overwrites a simulation scalar of the same name
>> yt : [INFO     ] 2016-10-06 10:05:05,151 Parameters:
>> current_time              = 1.81358087048e+15
>> yt : [INFO     ] 2016-10-06 10:05:05,151 Parameters:
>> domain_dimensions         = [16 16 96]
>> yt : [INFO     ] 2016-10-06 10:05:05,151 Parameters:
>> domain_left_edge          = [ -6.17136050e+21  -6.17136050e+21
>> -6.78849660e+22]
>> yt : [INFO     ] 2016-10-06 10:05:05,151 Parameters:
>> domain_right_edge         = [  6.17136050e+21   6.17136050e+21
>> 6.17136050e+21]
>> yt : [INFO     ] 2016-10-06 10:05:05,152 Parameters:
>> cosmological_simulation   = 0.0
>> level    # grids           # cells         # cells^3
>> ----------------------------------------------
>>   0        48             24576                30
>>   1       288            147456                53
>>   2      1856            950272                99
>>   3     10776           5517312               177
>> ----------------------------------------------
>>         12968           6639616
>>
>>
>> t = 1.81358087e+15 = 1.81358087e+15 s = 5.74689099e+07 years
>>
>> Smallest Cell:
>>     Width: 3.125e-05 Mpc
>>     Width: 3.125e+01 pc
>>     Width: 6.446e+06 AU
>>     Width: 9.643e+19 cm
>> len(ytMorton1): 5812736
>>
>> Using delta = [min(dx), min(dy), min(dz)]:
>> yt Morton: number of identical values: 0
>> Baert Morton: number of identical values: 0
>>
>> Using delta[i] = (dle[i] - dre[i]) / (1 << ORDER_MAX):
>> yt Morton: number of identical values: 5812735
>> Baert Morton: number of identical values: 0
>>
>>
>>
>>
>> On Sat, Oct 1, 2016 at 8:05 PM, Nathan Goldbaum <nathan12343 at gmail.com>
>> wrote:
>>
>>> Hi Jason,
>>>
>>> I'm CCing yt-dev here so that this project discussion happens in public.
>>> I hope that's ok.
>>>
>>> Yes, it would be great if you could take a stab at this. Unfortunately
>>> the main blocker here is choosing a good, fast, periodic KDTree that can be
>>> used easily from within cython. Having this available would be great for
>>> many other things besides the CutRegionSelection. We in fact already have a
>>> copy of scipy's C KDTree implementation vendored inside yt that we'd like
>>> to get rid of. I believe there are plans to either add a dependency on or
>>> vendor a fast cython kdtree that Meagan Lang has been working on:
>>>
>>> https://bitbucket.org/langmm/cykdtree
>>>
>>> Earlier today I took a stab at implementing a naive O(N^2) search in
>>> CutRegionSelector.select_cell. Unfortunately it's very, very slow to do
>>> this so we're going to need some sort of acceleration in order to
>>> realistically accomplish this, either via .a tree approach or by using a
>>> hash search like we're currently doing via python's builtin set object.
>>>
>>> An alternate way around this that maintains the O(1) speed of the
>>> current hashmap-based approach would be to compute unique integer indexes
>>> for each cell inside the cut_region and save those instead of just saving
>>> the floating point position of each cell. When I originally wrote the
>>> CutRegionSelector there wasn't anything like that in yt, but since then
>>> we've added a way to compute morton indices for each cell, which should do
>>> the trick.
>>>
>>> In fact, I went ahead and took a stab at this after coming up with that
>>> idea while writing this e-mail:
>>>
>>> https://bitbucket.org/yt_analysis/yt/pull-requests/2405
>>>
>>> And unfortunately it still doesn't work perfectly - I still see
>>> artifacts (although they're less severe) in the image produced by the test
>>> script in the issue.
>>>
>>> I think my attempt in PR 2405 might be going on the right track - you
>>> could try to poke around with it, or test it with your data.
>>>
>>> Sorry to not have a satisfactory answer here. I was very excited about
>>> initially being able to implement cut_region data sources for SlicePlots,
>>> but the issue with FLASH data has been an annoying thorn in my side for a
>>> while, it would be great if we could figure out a way to fix it once and
>>> for all.
>>>
>>> -Nathan
>>>
>>> On Sat, Oct 1, 2016 at 4:27 PM, Jason Galyardt <jason.galyardt at gmail.com
>>> > wrote:
>>>
>>>> Hi Nathan,
>>>>
>>>> Unfortunately, contours won't be enough for me -- I need the actual
>>>> data region to do analysis on the gas contained within (i.e. more than
>>>> visualization). I'm not very familiar with the yt codebase, but I'd like to
>>>> see whether it's worth my time to work on this issue. I'm not very familiar
>>>> with the yt codebase or Cython, but my programming skills are decent; how
>>>> much work do you think this would take? Could I use an existing KDTree
>>>> library, such as the one in SciPy or this C implementation
>>>> <https://github.com/jtsiomb/kdtree>?
>>>>
>>>> Thanks,
>>>> Jason
>>>>
>>>>
>>>> On Sat, Oct 1, 2016 at 3:44 PM, Nathan Goldbaum <nathan12343 at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Jason,
>>>>>
>>>>> Unfortunately no. I think I might be able to come back to this issue
>>>>> soonish, we're going to be introducing a new fast periodic KDTree which
>>>>> could be exploited here.
>>>>>
>>>>> For now I think you're going to need to make this plot in a different
>>>>> way, without using a cut_region. Perhaps by overplotting contours?
>>>>>
>>>>> Nathan
>>>>>
>>>>>
>>>>> On Saturday, October 1, 2016, Jason Galyardt <jason.galyardt at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Nathan,
>>>>>>
>>>>>> Is there a previous version of yt in which compound cut regions work?
>>>>>>
>>>>>> Thanks,
>>>>>> Jason
>>>>>>
>>>>>> -----
>>>>>>
>>>>>> Hi Jason,
>>>>>>
>>>>>> This is a known issue:
>>>>>>
>>>>>> https://bitbucket.org/yt_analysis/yt/issues/941/cutregionsel
>>>>>> ector-doesnt-work-with-some
>>>>>>
>>>>>> Unfortunately fixing it will require a modification to the cut_region
>>>>>> implementation to do a better job of searching for cells that belong to the
>>>>>> region for chained selectors.
>>>>>>
>>>>>> -Nathan
>>>>>>
>>>>>> On Fri, Sep 30, 2016 at 1:25 PM, Jason Galyardt <
>>>>>> jason.galyardt at gmail.com> wrote:
>>>>>> Dear all,
>>>>>>
>>>>>> I would like to generate a compound cut region for a FLASH
>>>>>> simulation, but the output I'm getting is odd. I've pasted my simple script
>>>>>> here:
>>>>>>
>>>>>> http://paste.yt-project.org/show/6839/
>>>>>>
>>>>>> It loads a simulation file, creates a compound cut_region (1e4 K < T
>>>>>> < 1e6 K), and applies it to a slice plot. The resulting image is banded, as
>>>>>> you can see by the attached plots. Am I missing something, or is this a
>>>>>> problem related to yt?
>>>>>>
>>>>>> FYI, I get the same behavior when I use a projection plot or only one
>>>>>> of the conditionals (e.g. T < 1e6 K). I've tried both the stock 3.3.1
>>>>>> version, as well as the self-contained development version that I installed
>>>>>> earlier this afternoon using install_script.sh. Let me know if any other
>>>>>> info is useful.
>>>>>>
>>>>>> Cheers,
>>>>>> Jason
>>>>>>
>>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20161006/61f9f01a/attachment.html>