diff --git a/Mixture Models.ipynb b/Mixture Models.ipynb
index d2ae8e00635d17986f98c18f379df6932b5b6cc1..45d440c0555593d324bff3219bb893adb3fe6e83 100644
--- a/Mixture Models.ipynb	
+++ b/Mixture Models.ipynb	
@@ -2429,6 +2429,18 @@
     "Note that if the sample clusters were not \"blobs\" of data, but were in concentric circles, the assumption of this method would be false and the method would simply not work well. This is why it is important to understand the underlying assumptions made in the method."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "fbee128c",
+   "metadata": {},
+   "source": [
+    "## Determining the number of clusters\n",
+    "\n",
+    "So far, we have assumed that the number of clusters is known. This is not often the case. One method to determine the number of clusters is to fit the GMM with different number of clusters and after each fit, calculate `gmm.bic()`. This method returns the Bayesian Information Criteria (see also AIC), which should be small when the fit worked well and the number of components is compatible with your sample.\n",
+    "\n",
+    "See a full example here: https://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_selection.html#sphx-glr-auto-examples-mixture-plot-gmm-selection-py"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "7076f779",
@@ -3470,7 +3482,7 @@
     "\n",
     "While the Gaussian Mixture Model presented beforehand had a very strong theoretical background, it still assumes that there is a single correct value for the Gaussian parameters. This may not be the case if the Gaussian model is only approximate (as is very often the case!).\n",
     "\n",
-    "One improvement to the Gaussian Mixture Models to allow for the usaeg of prior knowledge (a prior) on the parameters to be discovered. This helps us, for instance, to determine the ideal number of components automatically, since the number of components itself can be fit with it.\n",
+    "One improvement to the Gaussian Mixture Models to allow for the usage of prior knowledge (a prior) on the parameters to be discovered. This helps us, for instance, to determine the ideal number of components automatically, since the number of components itself can be fit with it (assuming a prior distribution on them).\n",
     "\n",
     "Optimizing this model becomes even more complicated as the derivation shown before and if we wanted to be fully general, we would need to use very slow algorithms, such as Monte-Carlo sampling to obtain uncertainties with the least amount of extra assumptions. This is often undesirable, since we need fast clustering and often Monte-Carlo sampling is very slow with even more data! An alternative is to assume some underlying prior probability for the means and covariances and find those parameters as well in an optimization algorithm. This is what is done in Variational Inference. Further details can be seen in Bishop (2006), or in a more practical approach, here: https://scikit-learn.org/stable/modules/mixture.html#bgmm\n",
     "\n",