python - Logistic curve produced by curve_fit is a straight line

I'm trying to produce a Sigmoid/Logistic curve from some input data. I borrowed code from this post and this for plotting.

The result is the following:

from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt

def sigmoid(x, L ,x0, k, b):
    y = L / (1 + np.exp(-k*(x-x0))) + b
    return (y)

data = np.loadtxt("data.csv", delimiter=",")
xdata = data[0]
ydata = data[1]

p0 = [max(ydata), np.median(xdata),1,min(ydata)] # this is an mandatory initial guess
fitting_parameters, covariance = curve_fit(sigmoid, xdata, ydata,p0, method='dogbox', maxfev=10000)

plt.plot(xdata, ydata, 'o', label='Data')
plt.plot(xdata, sigmoid(xdata, *fitting_parameters), '-', label='Fit')
plt.legend()
plt.show()

This produces a straight line instead of a logistic fit:

What am I missing? I know the data is a bit coarse, but is that the cause?

EDIT: Here is the raw data, if useful:

1.15102,1.17397,1.18334,1.18484,1.2073,1.25081,1.26446,1.26535,1.26654,1.29653,1.30118,1.30991,1.32608,1.39597,1.39721,1.41225,1.415,1.41989,1.47602,1.19992,1.23148,1.2895,1.31759,1.33068,1.34391,1.35604,1.35879,1.37359,1.38695,1.40233,1.41753,1.42323,1.43474,1.44706,1.48247,1.50033,1.52272,1.59789,1.09956,1.10712,1.13576,1.16265,1.16993,1.18129,1.19587,1.1973,1.20428,1.23916,1.24522,1.2505,1.26135,1.26542,1.27122,1.2736,1.27456,1.30306,1.34639,1.16272,1.18929,1.28076,1.28145,1.28513,1.28708,1.30215,1.30236,1.30887,1.31634,1.37677,1.37745,1.38119,1.38846,1.43016,1.43046,1.43234,1.48051,1.54508
0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95

I'm trying to produce a Sigmoid/Logistic curve from some input data. I borrowed code from this post and this for plotting.

The result is the following:

from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt

def sigmoid(x, L ,x0, k, b):
    y = L / (1 + np.exp(-k*(x-x0))) + b
    return (y)

data = np.loadtxt("data.csv", delimiter=",")
xdata = data[0]
ydata = data[1]

p0 = [max(ydata), np.median(xdata),1,min(ydata)] # this is an mandatory initial guess
fitting_parameters, covariance = curve_fit(sigmoid, xdata, ydata,p0, method='dogbox', maxfev=10000)

plt.plot(xdata, ydata, 'o', label='Data')
plt.plot(xdata, sigmoid(xdata, *fitting_parameters), '-', label='Fit')
plt.legend()
plt.show()

This produces a straight line instead of a logistic fit:

What am I missing? I know the data is a bit coarse, but is that the cause?

EDIT: Here is the raw data, if useful:

1.15102,1.17397,1.18334,1.18484,1.2073,1.25081,1.26446,1.26535,1.26654,1.29653,1.30118,1.30991,1.32608,1.39597,1.39721,1.41225,1.415,1.41989,1.47602,1.19992,1.23148,1.2895,1.31759,1.33068,1.34391,1.35604,1.35879,1.37359,1.38695,1.40233,1.41753,1.42323,1.43474,1.44706,1.48247,1.50033,1.52272,1.59789,1.09956,1.10712,1.13576,1.16265,1.16993,1.18129,1.19587,1.1973,1.20428,1.23916,1.24522,1.2505,1.26135,1.26542,1.27122,1.2736,1.27456,1.30306,1.34639,1.16272,1.18929,1.28076,1.28145,1.28513,1.28708,1.30215,1.30236,1.30887,1.31634,1.37677,1.37745,1.38119,1.38846,1.43016,1.43046,1.43234,1.48051,1.54508
0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95

Share Improve this question edited Mar 14 at 15:44 asked Mar 14 at 15:39 Tyler Shellberg 1,3863 gold badges13 silver badges41 bronze badges

You might have a bad initial guess and it always helps to compute the Jacobian and pass that argument. But without data to reproduce your results it will be difficult to help. – jared Commented Mar 14 at 15:42
I've appended the input data to the original post. I'm not very familiar with statistics. What is the importance of the 'initial guess'? What is the Jacobian, and how would I compute or pass it? – Tyler Shellberg Commented Mar 14 at 15:44
The data in that graph is positively correlated, but, beyond that, you'd have a hard job fitting any smooth curve to it. Are you sure that it's correct? – lastchance Commented Mar 14 at 15:47
It's the best I can get at the moment. For now I do not need perfection - just a 'good enough' logistic/sigmoid fit. – Tyler Shellberg Commented Mar 14 at 15:49
1 @mozway Numpy 2.2.3, Scipy version 1.15.2. – Tyler Shellberg Commented Mar 14 at 16:04

| Show 4 more comments

3 Answers 3

Sorted by: Reset to default 2

When I adjust your plotting code to zoom out, it fits nicely, in the scheme of things.

plt.plot(xdata, ydata, 'o', label='Data')
xlim = (-3, 3)
x = np.linspace(*xlim, 300)
plt.plot(x, sigmoid(x, *fitting_parameters), '-', label='Fit')
plt.xlim(*xlim)
plt.legend()
plt.show()

I suppose you want to see something like this:

p0 = [1, np.median(xdata), 15, 0] # this is an mandatory initial guess

plt.plot(xdata, ydata, 'o', label='Data')
xlim = (min(xdata)-1, max(xdata)+1)
x = np.linspace(*xlim, 300)
plt.plot(x, sigmoid(x, *p0), '-', label='Fit')
plt.xlim(*xlim)
plt.legend()
plt.show()

But if you try to use curve_fit to optimize the parameters from that guess, it makes the solution look like a straight line again, fitting to only a small portion of the curve. I think the best fit according to the least squares criterion just won't look like what you want it to look like. You would have to change the criterion from "minimize the sum of squared residuals" to something else, or add constraints on what the parameters can be (e.g. fix L and b). See, for example, scipy/scipy#21102 for some information about how to use SciPy's other optimization tools to do curve fitting manually, or you might want to look into the LMFIT package.

Update:

Maybe what you want is to fix L = 1 and b = 0 to reflect the fact that you're working with percentages.

from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt

def sigmoid(x, x0, k):
    y = 1 / (1 + np.exp(-k*(x-x0)))
    return y

xdata = np.asarray([1.15102,1.17397,1.18334,1.18484,1.2073,1.25081,1.26446,1.26535,1.26654,1.29653,1.30118,1.30991,1.32608,1.39597,1.39721,1.41225,1.415,1.41989,1.47602,1.19992,1.23148,1.2895,1.31759,1.33068,1.34391,1.35604,1.35879,1.37359,1.38695,1.40233,1.41753,1.42323,1.43474,1.44706,1.48247,1.50033,1.52272,1.59789,1.09956,1.10712,1.13576,1.16265,1.16993,1.18129,1.19587,1.1973,1.20428,1.23916,1.24522,1.2505,1.26135,1.26542,1.27122,1.2736,1.27456,1.30306,1.34639,1.16272,1.18929,1.28076,1.28145,1.28513,1.28708,1.30215,1.30236,1.30887,1.31634,1.37677,1.37745,1.38119,1.38846,1.43016,1.43046,1.43234,1.48051,1.54508])
ydata = np.asarray([0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95])

p0 = [np.median(xdata), 15] # this is an mandatory initial guess
fitting_parameters, covariance = curve_fit(sigmoid, xdata, ydata,p0, method='dogbox', maxfev=10000)

plt.plot(xdata, ydata, 'o', label='Data')
xlim = (min(xdata)-1, max(xdata)+1)
x = np.linspace(*xlim, 300)
plt.plot(x, sigmoid(x, *fitting_parameters), '-', label='Fit')
plt.xlim(*xlim)
plt.legend()
plt.show()

Then the best fit is:

In response to your follow-up question, to solve for x at a particular y, you can invert the equation algebraically:

p = 0.2
x0, k = fitting_parameters
-np.log(1 / p - 1) / k + x0
# 1.1584171856437602

Or you can solve numerically:

from scipy.optimize import root_scalar

def f(x, p):
    return sigmoid(x, *fitting_parameters) - p

p = 0.2
root_scalar(f, bracket=[0.5, 2.0], args=(p,))
#       converged: True
#            flag: converged
#  function_calls: 12
#      iterations: 11
#            root: 1.1584171856442531
#          method: brentq

The raw data appears to be several repeats of the same type of measurements with different underlying parameters. Isolating one of the sequences that looks vaguely plausible (there is a lot of noise on some). The reason that it looks so crazy is that there are multiple restarts on y from 0.05 to 0.95 which would be immediately apparent if the data point dots were joined together by lines.

The relevant data (2nd series in the compound dataset) that can be fitted are:

x	y	model	error
1.19992	0.05	0.03939	0.010607
1.23148	0.1	0.06522	0.034780
1.2895	0.15	0.156365	-0.006365
1.31759	0.2	0.229256	-0.029256
1.33068	0.25	0.270499	-0.020499
1.34391	0.3	0.316629	-0.016629
1.35604	0.35	0.362380	-0.012380
1.35879	0.4	0.373147	0.026853
1.37359	0.45	0.433025	0.016975
1.38695	0.5	0.488865	0.011135
1.40233	0.55	0.553406	-0.003406
1.41753	0.6	0.615478	-0.015478
1.42323	0.65	0.637928	0.012072
1.43474	0.7	0.681397	0.018603
1.44706	0.75	0.724653	0.025347
1.48247	0.8	0.826917	-0.026917
1.50033	0.85	0.865842	-0.015842
1.52272	0.9	0.903933	-0.003933
1.59789	0.95	0.970901	-0.020901

I chose a simpler much model for the sigmoid namely s(x) = (tanh(A*(x-x0))+1)/2.
I don't think the data is adequate to fit any more parameters than that. This is the graph of the model fit for least squares with A = 8.42 and x0 = 1.39. I slightly prefer a by eye fit of A=8

Fitting a model to each data series in the dataset will give much more sensible results. The average fit over the entire dataset taken whole is pretty meaningless (and rather sensitive to initial guess).

You could use seaborn.regplot:

import seaborn as sns

sns.regplot(x=xdata, y=ydata, logistic=True)

Output:

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - Logistic curve produced by curve_fit is a straight line - Stack Overflow

3 Answers 3

与本文相关的文章

评论列表(0)