put in concurrency

37123a4a · Astrid Muennich · c8a40c96 · 37123a4a · 37123a4a
Commit 37123a4a authored 6 years ago by Astrid Muennich
--- a/notebooks/Tutorial/calversion.ipynb
+++ b/notebooks/Tutorial/calversion.ipynb
@@ -10,7 +10,29 @@
    "\n",
    "A small example how to adapt a notebook to run with the offline calibration package \"pycalibation\".\n",
    "\n",
-    "The first cell contains all parameters that should be exposed to the command line."
+    "The first cell contains all parameters that should be exposed to the command line.\n",
+    "\n",
+    "To run this notebooks with several different input parameters in parallel by submitting multiple slurm jobs, for example for various random seed we can do the following:\n",
+    "\n",
+    "xfel-calibrate TUTORIAL TEST --random-seed 1,2,3,4\n",
+    "\n",
+    "or\n",
+    "\n",
+    "xfel-calibrate TUTORIAL TEST --random-seed 1-5\n",
+    "\n",
+    "will produce 4 jobs:\n",
+    "\n",
+    "Parsed input 1,2,3,4 to [1, 2, 3, 4]\n",
+    "\n",
+    "Submitted job: 1169340\n",
+    "\n",
+    "Submitted job: 1169341\n",
+    "\n",
+    "Submitted job: 1169342\n",
+    "\n",
+    "Submitted job: 1169343\n",
+    "\n",
+    "Submitted the following SLURM jobs: 1169340,1169341,1169342,1169343"
   ]
  },
  {
@@ -23,7 +45,7 @@
   "source": [
    "out_folder = \"/gpfs/exfel/data/scratch/amunnich/tutorial\" # output folder\n",
    "sensor_size = [10, 30] # defining the picture size\n",
-    "random_seed = 2345 # random seed for filling of fake data array. Change it to produce different results.\n",
+    "random_seed = [2345] # random seed for filling of fake data array. Change it to produce different results, range allowed\n",
    "runs = 500 # how may iterations to fill histograms\n",
    "cluster_profile = \"tutorial\" "
   ]
@@ -32,7 +54,8 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "First include what we need and set up the cluster profile. Everything that has a written response in a cell will show up in the report, e.g. prints but also return values or errors."
+    "First include what we need and set up the cluster profile for parallel processing on one node utilising more than one core.\n",
+    "Everything that has a written response in a cell will show up in the report, e.g. prints but also return values or errors."
   ]
  },
  {
@@ -105,16 +128,19 @@
   },
   "outputs": [],
   "source": [
+    "# in order to run several random seeds in parallel the parameter has to be a list. To use the current single value in this \n",
+    "# notebook we use the first entry in the list\n",
+    "random_seed_single = random_seed[0]\n",
    "fake_data = []\n",
    "for i in range(runs):\n",
-    "    fake_data.append(data_creation(random_seed+10*i))"
+    "    fake_data.append(data_creation(random_seed_single+10*i))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Plot the random image. everything we write here in the markup cells will show up as text in the report."
+    "Create some random images and plot them. Everything we write here in the markup cells will show up as text in the report."
   ]
  },
  {
@@ -183,7 +209,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "To parallelise jobs we use the ipyparallel client."
+    "To parallelise jobs we use the ipyparallel client. This will run on one node an ipcluster with the specified number of cores given in xfel_calibrate/notebooks.py."
   ]
  },
  {

 %% Cell type:markdown id: tags:
 # Tutorial Calculation #
 Author: Astrid Muennich, Version 0.1
 A small example how to adapt a notebook to run with the offline calibration package "pycalibation".
 The first cell contains all parameters that should be exposed to the command line.
+To run this notebooks with several different input parameters in parallel by submitting multiple slurm jobs, for example for various random seed we can do the following:
+xfel-calibrate TUTORIAL TEST --random-seed 1,2,3,4
+or
+xfel-calibrate TUTORIAL TEST --random-seed 1-5
+will produce 4 jobs:
+Parsed input 1,2,3,4 to [1, 2, 3, 4]
+Submitted job: 1169340
+Submitted job: 1169341
+Submitted job: 1169342
+Submitted job: 1169343
+Submitted the following SLURM jobs: 1169340,1169341,1169342,1169343
 %% Cell type:code id: tags:
 ``` python
 out_folder = "/gpfs/exfel/data/scratch/amunnich/tutorial" # output folder
 sensor_size = [10, 30] # defining the picture size
-random_seed = 2345 # random seed for filling of fake data array. Change it to produce different results.
+random_seed = [2345] # random seed for filling of fake data array. Change it to produce different results, range allowed
 runs = 500 # how may iterations to fill histograms
 cluster_profile = "tutorial"
 ```
 %% Cell type:markdown id: tags:
-First include what we need and set up the cluster profile. Everything that has a written response in a cell will show up in the report, e.g. prints but also return values or errors.
+First include what we need and set up the cluster profile for parallel processing on one node utilising more than one core.
+Everything that has a written response in a cell will show up in the report, e.g. prints but also return values or errors.
 %% Cell type:code id: tags:
 ``` python
 import matplotlib
 %matplotlib inline
 import numpy as np
 import matplotlib.pyplot as plt
 # if not using slurm: make sure a cluster is running with
 # ipcluster start --n=4 --profile=tutorial
 # give it a while to start
 from ipyparallel import Client
 print("Connecting to profile {}".format(cluster_profile))
 view = Client(profile=cluster_profile)[:]
 view.use_dill()
 ```
 %% Output
    Connecting to profile tutorial
    <AsyncResult: use_dill>
 %% Cell type:markdown id: tags:
 ## Create some random data
 %% Cell type:code id: tags:
 ``` python
 def data_creation(random_seed):
    np.random.seed = random_seed
    return np.random.random((sensor_size))
 ```
 %% Cell type:code id: tags:
 ``` python
+# in order to run several random seeds in parallel the parameter has to be a list. To use the current single value in this
+# notebook we use the first entry in the list
+random_seed_single = random_seed[0]
 fake_data = []
 for i in range(runs):
-    fake_data.append(data_creation(random_seed+10*i))
+    fake_data.append(data_creation(random_seed_single+10*i))
 ```
 %% Cell type:markdown id: tags:
-Plot the random image. everything we write here in the markup cells will show up as text in the report.
+Create some random images and plot them. Everything we write here in the markup cells will show up as text in the report.
 %% Cell type:code id: tags:
 ``` python
 plt.subplot(211)
 plt.imshow(fake_data[0], interpolation="nearest")
 plt.title('Random Image')
 plt.ylabel('sensor height')
 plt.subplot(212)
 plt.imshow(fake_data[5], interpolation="nearest")
 plt.xlabel('sensor width')
 plt.ylabel('sensor height')
 plt.subplots_adjust(bottom=0.1, right=0.8, top=0.9)
 cax = plt.axes([0.85, 0.1, 0.075, 0.9])
 plt.colorbar(cax=cax).ax.set_ylabel("# counts")
 plt.show()
 ```
 %% Output
 %% Cell type:markdown id: tags:
 These plots show two randomly filled sensor images. We can use markup cells also as captions for images.
 %% Cell type:markdown id: tags:
 ## Simple Analysis
 %% Cell type:code id: tags:
 ``` python
 mean = []
 std = []
 for im in fake_data:
    mean.append(im.mean())
    std.append(im.std())
 ```
 %% Cell type:markdown id: tags:
-To parallelise jobs we use the ipyparallel client.
+To parallelise jobs we use the ipyparallel client. This will run on one node an ipcluster with the specified number of cores given in xfel_calibrate/notebooks.py.
 %% Cell type:code id: tags:
 ``` python
 from functools import partial
 def parallel_stats(input):
    return input.mean(), input.std()
 p = partial(parallel_stats)
 results = view.map_sync(p, fake_data)
 p_mean= [ x[0] for x in results ]
 p_std= [ x[1] for x in results ]
 ```
 %% Cell type:markdown id: tags:
 We calculate the mean value of all images, as well as the standard deviation.
 %% Cell type:code id: tags:
 ``` python
 plt.subplot(221)
 plt.hist(mean, 50)
 plt.xlabel('mean')
 plt.ylabel('counts')
 plt.title('Mean value')
 plt.subplot(222)
 plt.hist(p_mean, 50)
 plt.xlabel('mean parallel')
 plt.ylabel('counts')
 plt.title('Parallel Mean value')
 plt.subplot(223)
 plt.hist(std, 50)
 plt.xlabel('std')
 plt.ylabel('counts')
 plt.title('Std value')
 plt.subplot(224)
 plt.hist(p_std, 50)
 plt.xlabel('std parallel')
 plt.ylabel('counts')
 plt.title('Parallel Std value')
 plt.subplots_adjust(top=0.99, bottom=0.01, left=0.01, right=0.99, hspace=0.7, wspace=0.35)
 plt.show()
 ```
 %% Output
 %% Cell type:code id: tags:
 ``` python
 ```

--- a/xfel_calibrate/notebooks.py
+++ b/xfel_calibrate/notebooks.py
@@ -84,7 +84,7 @@ notebooks = {
            "TUTORIAL": {
                       "TEST": {
                               "notebook": "notebooks/Tutorial/calversion.ipynb",
-                               "concurrency": {"parameter": None,
+                               "concurrency": {"parameter": "random_seed",
                                               "default concurrency": None,
                                               "cluster cores": 32},
                               },