Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: Problems with DCT waitForState command
Replies: 1   Last Post: Dec 12, 2012 4:13 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View  
onzyone@gmail.com

Posts: 1
Registered: 12/12/12
Re: Problems with DCT waitForState command
Posted: Dec 12, 2012 4:13 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On Tuesday, September 16, 2008 3:18:21 AM UTC-4, Edric M Ellis wrote:
> "Eric Solano" <ericssolano@gmail.com> writes:
>

> > I have a script that uses the DCT and calls a job scheduler (MOAB) to schedule
> > several jobs on a cluster. The jobs seems to be scheduled properly and
> > executed by the workers. However, when I try to gather the results for output,
> > the execution seems to be stuck forever at the waitForState command.

>
> Hi Eric,
>
> When things get stuck in "waitForState" for much longer than they should, that
> generally means that execution on the cluster hasn't worked completely
> correctly. In particular, if the state of the job (as far as DCT is concerned)
> never progresses beyond "queued", that is generally an indication that MATLAB on
> the cluster either hasn't been launched successfully, or it cannot write to the
> files in your DataLocation. (By the way, I assume that all jobs fail in this
> way, but that "qstat" indicates that they've completed)
>
> Are you using the example integration scripts for PBS/Torque? Do you have the
> output files created? If so, they may shed some light on why things aren't
> completing correctly.
>
> (The usual problems with using the example integration scripts are either that
> those scripts aren't on the default MATLAB path of the workers, or
> ClusterMatlabRoot isn't set correctly, or the workers cannot access the
> DataLocation).
>
> Cheers,
>
> Edric.


Hello Edric,

I have run into this issue as well on with the same setup: moab / torque. I have confirmed that the jobs have run successfully in the cluster but the client hung. (tested with 1000 distributed jobs) This may be the way that MATLAB has implemented their "checkjob" code

Let me know what you think

Thanks,
Jason.




Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.