Skip to content
Snippets Groups Projects

Launch jobs with variable nice value

Merged Philipp Schmidt requested to merge feat/degressive-slurm-priority into master

Description

Based on the discussion in https://git.xfel.eu/calibration/planning/-/issues/110, this MR introduces a flexible --nice value passed to sbatch depending on the number of jobs running in the selected partition for this instrument. Originally only meant for active proposals, the code is cleaner when just using the partition value determined before and is currently changed by ITDM's script anyway shortly after submission. I will clarify this point before merging.

How Has This Been Tested?

Tests! Unit tests!

Types of changes

  • New feature (non-breaking change which adds functionality)

Reviewers

@roscar @kluyvert @ahmedk

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Philipp Schmidt changed milestone to %3.5.2

    changed milestone to %3.5.2

  • added 1 commit

    • 23a04dc3 - Try it with explicit returncode

    Compare with previous version

    • Resolved by Thomas Kluyver

      Nice! I've been wary of trying to manage priorities because that seems like Slurm's job, but this looks pretty clean and testable.

      At present, this mechanism is used to give dark processing priority over correction. Is it intentional to get rid of that, or should we include an adjustment for dark runs?

      Is it possible that the fake subprocess doesn't catch the subprocess call in CI? Do you have the same version of Python (3.8.10) in your test environment?

  • This is pretty weird. I thought it could have been an issue caused by the containers in singularity, but I checked the pytest supbrocess code and it works by faking popen instead of messing with PATH and adding in shims, so it's really not clear to me what could cause this.

    I've got some semi-urgent stuff to work on right now for Thomas (M) so I'll try and finish that asap and check this out later today (as well as some pending MRs).

  • Philipp Schmidt added 2 commits

    added 2 commits

    • 4dcd37e0 - (fixup) Slightly nicer testing code
    • 5f30852e - Add slurm priority penalties to webservice configuration

    Compare with previous version

  • Philipp Schmidt added 2 commits

    added 2 commits

    • 4dcd37e0 - (fixup) Slightly nicer testing code
    • 5f30852e - Add slurm priority penalties to webservice configuration

    Compare with previous version

  • Thomas Kluyver added 1 commit

    added 1 commit

    Compare with previous version

  • Thomas Kluyver added 1 commit

    added 1 commit

    • 1724fae1 - Make monkeypatching of function temporary

    Compare with previous version

  • Thomas Kluyver added 1 commit

    added 1 commit

    • 6076ec53 - Make monkeypatching of function temporary

    Compare with previous version

  • Thomas Kluyver added 1 commit

    added 1 commit

    • 6076ec53 - Make monkeypatching of function temporary

    Compare with previous version

  • Thomas Kluyver resolved all threads

    resolved all threads

  • We figured out that another test was replacing the run_proc_async function, so the new tests failed if that test was run before them in the same session. I've fixed that by using pytest's monkeypatch fixture to replace it temporarily - when the test in question finishes, monkeypatch sets it back to the original value.

  • With @kluyvert knowing how to properly use --nice, we finally learned it is directly substracted from the priority. I was going to rebalance the numbers and formula after this to ensure active proposals may not fall below exfel jobs, but it seems there partitions are assigned a PriorityTier not the same as a job's Priority.

    Looking at this value's documentation:

    PriorityTier: Configure the partition's PriorityTier setting relative to other partitions to control the preemptive behavior when PreemptType=preempt/partition_prio. This option is not relevant if PreemptType=preempt/qos. If two jobs from two different partitions are allocated to the same resources, the job in the partition with the greater PriorityTier value will preempt the job in the partition with the lesser PriorityTier value. If the PriorityTier values of the two partitions are equal then no preemption will occur. The default PriorityTier value is 1.

    NOTE: In addition to being used for partition based preemption, PriorityTier also has an effect on scheduling. The scheduler will evaluate jobs in the partition(s) with the highest PriorityTier before evaluating jobs in other partitions, regardless of which jobs have the highest Priority. The scheduler will consider the job priority when evaluating jobs within the partition(s) with the same PriorityTier.

    The configuration on Maxwell right now is:

    • upex-high: 1000
    • upex-middle: 900
    • exfel: 500

    I would conclude that we can freely substract values from a job's priority through --nice without changing the priority order of partitions, i.e. we do not have to careful to substract too much from an upex-middle job to fall below exfel.

    Do you agree with this conclusion?

  • That's how I would read the bit of the docs you copied as well.

686 penalty of 25 running jobs.
687
688 :param partition: Partition to run jobs in.
689 :param instrument: Instrument to run jobs for.
690 :param cycle: Cycle of proposal to run jobs for.
691 :param job_penalty: Scaling factor per job, 2 by default.
692 :param commissioning_penalty: Base penalty for commissioning,
693 1250 by default.
694 :return: Nice value to be passed to sbatch --nice
695 """
696
697 # List all names for jobs running in the specified partition.
698 code, job_names = await run_proc_async(
699 ['squeue', '-h', '-o', '%.20j', '-p', partition, '--me'])
700
701 if code != 0:
  • LGTM beside a minor comment.

  • added 1 commit

    • 77d6c5d0 - (fixup) Log error when squeue returns non-zero

    Compare with previous version

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading