[Yt-dev] Fw: X1024 jobs on Ranger

Matthew Turk matthewturk at gmail.com
Thu Apr 30 13:17:39 PDT 2009


Hi Stephen,

Not sure what to tell you.  However, I would like to note that HOP
does not in any way pool file access.  Projections did -- they do not
any more as of yesterday or the day before -- but HOP does not.  You
can try to implement this via the preload command, which you will also
see inside the profiling module.  We could improve IO by making the
DataQueue object aware of which grids will be accessed and then doing
pool-on-demand, where you'd call 'preload', it'd know which grids to
pool access to, and then when *one* is accessed, all the others in
that CPU file would also get pulled.

Unfortunately, I cannot give my time to this today, but maybe you
could start on that path and see what you can come up with?

-Matt

On Thu, Apr 30, 2009 at 1:10 PM, Stephen Skory <stephenskory at yahoo.com> wrote:
>
> I'm breaking Ranger...
>
> before I reply, do any of you have something intelligent to add that I can say? Because the .cpu files aren't spatially restricted, each thread of parallel HOP may have to access many of the .cpu files, which is how the MDS server gets pummeled.
>
>  _______________________________________________________
> sskory at physics.ucsd.edu           o__  Stephen Skory
> http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student
> ________________________________(_)_\(_)_______________
>
>
>
> ----- Forwarded Message ----
>> From: Tommy Minyard <minyard at tacc.utexas.edu>
>> To: "sskory at physics.ucsd.edu" <sskory at physics.ucsd.edu>
>> Sent: Thursday, April 30, 2009 12:54:49 PM
>> Subject: X1024 jobs on Ranger
>>
>> Hello Stephen,
>>
>> In our monitoring of Ranger, we've noticed that some of your recent jobs named
>> x1024 seem to be causing an abnormally high load on the /scratch filesystem
>> meta-data server (MDS).  From our monitoring, it appears that the MDS load goes
>> way up when your job initially begins to run for up to the first 30 minutes to
>> hour, but then it drops back down to a more reasonable load after the job has
>> been running for a while.
>>
>> Do you have any idea what may be triggering such a high load from the
>> application you are running?  It does not seem to cause any major problems or
>> generate errors, however, the filesystem access becomes much more sluggish when
>> the MDS load is so high.  If you could give us a few more details that might
>> help explain the high load or point us to the source code for your application,
>> we want to check and confirm that the MDS is acting as it should.
>>
>> Thanks,
>> Tommy
>>
>> ____________________________________________________________________
>> Tommy Minyard, Ph.D. - Assoc. Director    (512) 232-6578
>> Advanced Computing Systems Group          (512) 475-9445 (fax)
>> Texas Advanced Computing Center           http://www.tacc.utexas.edu
>> The University of Texas at Austin        minyard at tacc.utexas.edu
>
> _______________________________________________
> Yt-dev mailing list
> Yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>



More information about the yt-dev mailing list