I have a one-dimensional set of data points, of which I want to parameterise the probability density. I have reasons to believe that a Gaussian mixture model would be a good way to accomplish this, so I'm trying to use scikit-learn's GaussianMixture class to fit the parameters and weights of two Gaussian distributions.
Toy example:
import numpy as np
from sklearn.mixture import GaussianMixture
stdev_1 = 5
stdev_2 = 30
gaussian_data_1 = stdev_1 * np.random.randn(1000)
gaussian_data_2 = stdev_2 * np.random.randn(1000)
data = np.concatenate([gaussian_data_1, gaussian_data_2])
model = GaussianMixture(2)
data_2d = data.reshape((len(data), 1))
model.fit(data_2d)
print("Estimated means:", model.means_[:, 0])
print("Estimated stdevs:", model.covariances_[:, 0, 0] ** 0.5)
print("Estimated weights:", model.weights_)
The resulting model has reasonable estimates of two Gaussians. I put in means of zero, and standard deviations of 5 and 30, both with weights of 0.5 (both have 1000 data points), and it finds means of [-0.0715483 and -0.06263915], standard deviations of [ 5.46757321 and 30.77977466], and weights of [0.53427173 and 0.46572827].
So far so good.
However, in my application, I know that the underlying distribution is unimodal, and I really only want to find out which combination of (standard deviations and weights) fits best. Hence, I'd like to force this model to use the same means, for instance by simply passing it the mean myself, and having it only optimise the weights and standard deviations (variances).
Is this possible with scikit-learn? The GaussianMixture class seems to be designed for classification, while I'm actually using it for the sake of parameterising a distribution, so it might be that there's a better solution that I'm not aware of.
If this is not feasible with scikit-learn, any suggestions on how to do this (Preferably with >2 Gaussians)?