最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Logistic curve produced by curve_fit is a straight line - Stack Overflow

programmeradmin2浏览0评论

I'm trying to produce a Sigmoid/Logistic curve from some input data. I borrowed code from this post and this for plotting.

The result is the following:

from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt

def sigmoid(x, L ,x0, k, b):
    y = L / (1 + np.exp(-k*(x-x0))) + b
    return (y)

data = np.loadtxt("data.csv", delimiter=",")
xdata = data[0]
ydata = data[1]

p0 = [max(ydata), np.median(xdata),1,min(ydata)] # this is an mandatory initial guess
fitting_parameters, covariance = curve_fit(sigmoid, xdata, ydata,p0, method='dogbox', maxfev=10000)

plt.plot(xdata, ydata, 'o', label='Data')
plt.plot(xdata, sigmoid(xdata, *fitting_parameters), '-', label='Fit')
plt.legend()
plt.show()

This produces a straight line instead of a logistic fit:

What am I missing? I know the data is a bit coarse, but is that the cause?

EDIT: Here is the raw data, if useful:

1.15102,1.17397,1.18334,1.18484,1.2073,1.25081,1.26446,1.26535,1.26654,1.29653,1.30118,1.30991,1.32608,1.39597,1.39721,1.41225,1.415,1.41989,1.47602,1.19992,1.23148,1.2895,1.31759,1.33068,1.34391,1.35604,1.35879,1.37359,1.38695,1.40233,1.41753,1.42323,1.43474,1.44706,1.48247,1.50033,1.52272,1.59789,1.09956,1.10712,1.13576,1.16265,1.16993,1.18129,1.19587,1.1973,1.20428,1.23916,1.24522,1.2505,1.26135,1.26542,1.27122,1.2736,1.27456,1.30306,1.34639,1.16272,1.18929,1.28076,1.28145,1.28513,1.28708,1.30215,1.30236,1.30887,1.31634,1.37677,1.37745,1.38119,1.38846,1.43016,1.43046,1.43234,1.48051,1.54508
0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95

I'm trying to produce a Sigmoid/Logistic curve from some input data. I borrowed code from this post and this for plotting.

The result is the following:

from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt

def sigmoid(x, L ,x0, k, b):
    y = L / (1 + np.exp(-k*(x-x0))) + b
    return (y)

data = np.loadtxt("data.csv", delimiter=",")
xdata = data[0]
ydata = data[1]

p0 = [max(ydata), np.median(xdata),1,min(ydata)] # this is an mandatory initial guess
fitting_parameters, covariance = curve_fit(sigmoid, xdata, ydata,p0, method='dogbox', maxfev=10000)

plt.plot(xdata, ydata, 'o', label='Data')
plt.plot(xdata, sigmoid(xdata, *fitting_parameters), '-', label='Fit')
plt.legend()
plt.show()

This produces a straight line instead of a logistic fit:

What am I missing? I know the data is a bit coarse, but is that the cause?

EDIT: Here is the raw data, if useful:

1.15102,1.17397,1.18334,1.18484,1.2073,1.25081,1.26446,1.26535,1.26654,1.29653,1.30118,1.30991,1.32608,1.39597,1.39721,1.41225,1.415,1.41989,1.47602,1.19992,1.23148,1.2895,1.31759,1.33068,1.34391,1.35604,1.35879,1.37359,1.38695,1.40233,1.41753,1.42323,1.43474,1.44706,1.48247,1.50033,1.52272,1.59789,1.09956,1.10712,1.13576,1.16265,1.16993,1.18129,1.19587,1.1973,1.20428,1.23916,1.24522,1.2505,1.26135,1.26542,1.27122,1.2736,1.27456,1.30306,1.34639,1.16272,1.18929,1.28076,1.28145,1.28513,1.28708,1.30215,1.30236,1.30887,1.31634,1.37677,1.37745,1.38119,1.38846,1.43016,1.43046,1.43234,1.48051,1.54508
0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95
Share Improve this question edited Mar 14 at 15:44 Tyler Shellberg asked Mar 14 at 15:39 Tyler ShellbergTyler Shellberg 1,3863 gold badges13 silver badges41 bronze badges 9
  • You might have a bad initial guess and it always helps to compute the Jacobian and pass that argument. But without data to reproduce your results it will be difficult to help. – jared Commented Mar 14 at 15:42
  • I've appended the input data to the original post. I'm not very familiar with statistics. What is the importance of the 'initial guess'? What is the Jacobian, and how would I compute or pass it? – Tyler Shellberg Commented Mar 14 at 15:44
  • The data in that graph is positively correlated, but, beyond that, you'd have a hard job fitting any smooth curve to it. Are you sure that it's correct? – lastchance Commented Mar 14 at 15:47
  • It's the best I can get at the moment. For now I do not need perfection - just a 'good enough' logistic/sigmoid fit. – Tyler Shellberg Commented Mar 14 at 15:49
  • 1 @mozway Numpy 2.2.3, Scipy version 1.15.2. – Tyler Shellberg Commented Mar 14 at 16:04
 |  Show 4 more comments

3 Answers 3

Reset to default 2

When I adjust your plotting code to zoom out, it fits nicely, in the scheme of things.

plt.plot(xdata, ydata, 'o', label='Data')
xlim = (-3, 3)
x = np.linspace(*xlim, 300)
plt.plot(x, sigmoid(x, *fitting_parameters), '-', label='Fit')
plt.xlim(*xlim)
plt.legend()
plt.show()

I suppose you want to see something like this:

p0 = [1, np.median(xdata), 15, 0] # this is an mandatory initial guess

plt.plot(xdata, ydata, 'o', label='Data')
xlim = (min(xdata)-1, max(xdata)+1)
x = np.linspace(*xlim, 300)
plt.plot(x, sigmoid(x, *p0), '-', label='Fit')
plt.xlim(*xlim)
plt.legend()
plt.show()

But if you try to use curve_fit to optimize the parameters from that guess, it makes the solution look like a straight line again, fitting to only a small portion of the curve. I think the best fit according to the least squares criterion just won't look like what you want it to look like. You would have to change the criterion from "minimize the sum of squared residuals" to something else, or add constraints on what the parameters can be (e.g. fix L and b). See, for example, scipy/scipy#21102 for some information about how to use SciPy's other optimization tools to do curve fitting manually, or you might want to look into the LMFIT package.


Update:

Maybe what you want is to fix L = 1 and b = 0 to reflect the fact that you're working with percentages.

from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt

def sigmoid(x, x0, k):
    y = 1 / (1 + np.exp(-k*(x-x0)))
    return y

xdata = np.asarray([1.15102,1.17397,1.18334,1.18484,1.2073,1.25081,1.26446,1.26535,1.26654,1.29653,1.30118,1.30991,1.32608,1.39597,1.39721,1.41225,1.415,1.41989,1.47602,1.19992,1.23148,1.2895,1.31759,1.33068,1.34391,1.35604,1.35879,1.37359,1.38695,1.40233,1.41753,1.42323,1.43474,1.44706,1.48247,1.50033,1.52272,1.59789,1.09956,1.10712,1.13576,1.16265,1.16993,1.18129,1.19587,1.1973,1.20428,1.23916,1.24522,1.2505,1.26135,1.26542,1.27122,1.2736,1.27456,1.30306,1.34639,1.16272,1.18929,1.28076,1.28145,1.28513,1.28708,1.30215,1.30236,1.30887,1.31634,1.37677,1.37745,1.38119,1.38846,1.43016,1.43046,1.43234,1.48051,1.54508])
ydata = np.asarray([0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95])

p0 = [np.median(xdata), 15] # this is an mandatory initial guess
fitting_parameters, covariance = curve_fit(sigmoid, xdata, ydata,p0, method='dogbox', maxfev=10000)

plt.plot(xdata, ydata, 'o', label='Data')
xlim = (min(xdata)-1, max(xdata)+1)
x = np.linspace(*xlim, 300)
plt.plot(x, sigmoid(x, *fitting_parameters), '-', label='Fit')
plt.xlim(*xlim)
plt.legend()
plt.show()

Then the best fit is:


In response to your follow-up question, to solve for x at a particular y, you can invert the equation algebraically:

p = 0.2
x0, k = fitting_parameters
-np.log(1 / p - 1) / k + x0
# 1.1584171856437602

Or you can solve numerically:

from scipy.optimize import root_scalar

def f(x, p):
    return sigmoid(x, *fitting_parameters) - p

p = 0.2
root_scalar(f, bracket=[0.5, 2.0], args=(p,))
#       converged: True
#            flag: converged
#  function_calls: 12
#      iterations: 11
#            root: 1.1584171856442531
#          method: brentq

The raw data appears to be several repeats of the same type of measurements with different underlying parameters. Isolating one of the sequences that looks vaguely plausible (there is a lot of noise on some). The reason that it looks so crazy is that there are multiple restarts on y from 0.05 to 0.95 which would be immediately apparent if the data point dots were joined together by lines.

The relevant data (2nd series in the compound dataset) that can be fitted are:

x y model error
1.19992 0.05 0.03939 0.010607
1.23148 0.1 0.06522 0.034780
1.2895 0.15 0.156365 -0.006365
1.31759 0.2 0.229256 -0.029256
1.33068 0.25 0.270499 -0.020499
1.34391 0.3 0.316629 -0.016629
1.35604 0.35 0.362380 -0.012380
1.35879 0.4 0.373147 0.026853
1.37359 0.45 0.433025 0.016975
1.38695 0.5 0.488865 0.011135
1.40233 0.55 0.553406 -0.003406
1.41753 0.6 0.615478 -0.015478
1.42323 0.65 0.637928 0.012072
1.43474 0.7 0.681397 0.018603
1.44706 0.75 0.724653 0.025347
1.48247 0.8 0.826917 -0.026917
1.50033 0.85 0.865842 -0.015842
1.52272 0.9 0.903933 -0.003933
1.59789 0.95 0.970901 -0.020901

I chose a simpler much model for the sigmoid namely s(x) = (tanh(A*(x-x0))+1)/2.
I don't think the data is adequate to fit any more parameters than that. This is the graph of the model fit for least squares with A = 8.42 and x0 = 1.39. I slightly prefer a by eye fit of A=8

Fitting a model to each data series in the dataset will give much more sensible results. The average fit over the entire dataset taken whole is pretty meaningless (and rather sensitive to initial guess).

You could use seaborn.regplot:

import seaborn as sns

sns.regplot(x=xdata, y=ydata, logistic=True)

Output:

发布评论

评论列表(0)

  1. 暂无评论