I have various pump performance data, and I am trying to fit curves as a combination of them.
In below image, different curves are at different pump speeds (RPM), and the Y axis is Pressure head, and X axis is flow rate.
I am trying to figure out the following functions:
pressure = func (flowrate, rpm)
flowrate = func (pressure, rpm)
rpm = func (flowrate, pressure)
Usually the pump curves follow some form of affinity laws, as
- flow rate is proportional to rpm.
- pressure is proportional to square of rpm.
However in reality - they do not, and there will be a difference between experimental data & theoretical data. So that is why I am looking to fit a function that looks like this: (open to other suggestions) - like 3rd degree polynomial as well.
Y = a*x1**2 + b*x1+ c*x2**2 + d*x2 + e*x1*x2 + f
However, I am getting the wrong results with scipy - curve_fit. I am not in favor of employing sklearn regression as we would not be able to figure out the coefficients.
Is there a way one can employ this in Python?
See my code below so far:
from numpy import genfromtxt
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import pandas as pd
# df = pd.read_excel('./nominal_data/pump_curve.xlsx', sheet_name =0)
df = pd.DataFrame({
'Flow(lpm)': [128.0846942, 105.7579874, 95.11146262, 69.57902038, 53.25344504, 35.47260492, np.nan, 131.96, 110.57, 91.32, 73.02, 53.9, 20.41, np.nan, 116.06, 99.46, 83.7, 68.84, 54.47, 20.98, np.nan, 103.0, 87.6, 73.6, 57.8, 44.0, 19.49, np.nan, 86.2, 73.1, 56.0, 42.6, 28.1, 16.33, np.nan, 56.2, 47.1, 38.6, 30.9, 24.0, 12.3],
'Speed Feedback (RPM)': [5204.0, 5208.0, 5206.0, 5206.0, 5176.0, 5175.0, np.nan, 4710.72, 4706.4, 4714.93, 4687.11, 4691.0, 4602.0, np.nan, 4103.21, 4115.26, 4147.8, 4148.14, 4141.09, 4124.72, np.nan, 3675.89, 3657.88, 3673.73, 3671.41, 3675.27, 3664.88, np.nan, 3118.66, 3186.23, 3106.92, 3107.19, 3114.69, 3090.08, np.nan, 2077.44, 2073.23, 2062.01, 2069.37, 2068.02, 2067.91],
'dP (PSI)': [16.5, 25.34, 28.78, 35.45, 37.86, 38.87, np.nan, 8.85, 17.01, 23.42, 27.48, 30.5, 32.4, np.nan, 6.69, 11.84, 17.24, 20.16, 22.64, 25.81, np.nan, 5.2, 9.6, 13.2, 16.3, 18.1, 20.38, np.nan, 3.7, 6.5, 10.0, 12.1, 13.5, 14.54, np.nan, 1.2, 2.7, 3.7, 4.7, 5.2, 6.3]
})
flow_lpm = df['Flow(lpm)'].to_numpy()
rpm = df['Speed Feedback (RPM)'].to_numpy()
dP = df['dP (PSI)'].to_numpy()
X = (flow_lpm, rpm)
y = dP
def f(X, a, b, c, d, e, f):
x_1, x_2 = X
return a*x_1**2 + b*x_1 + c*x_2**2 + d*x_2 + e*x_1*x_2 + f
param, param_cov = curve_fit(f, X, y)
# Test a random data point.
flow_test = 54
rpm_test = 5195
dp_test = param[0]*flow_test**2 + param[1]*flow_test + param[2]*rpm_test**2 + param[3]**rpm_test + param[4]*flow_test*rpm_test + param[5]
print (dp_test)
I could have multiple curves of Pressure = function (flow rate) at multiple RPMs like the figure shows. For example there is a 3rd degree curve for a 5100 RPM curve. However it only works for a given RPM. The curve is different for a 2070 RPM situation.
Seeking suggestions on proper curve fitting methodology. Open to deleting end points, to ensure the curve fit works in the middle range.
datafile_imagefile
I have various pump performance data, and I am trying to fit curves as a combination of them.
In below image, different curves are at different pump speeds (RPM), and the Y axis is Pressure head, and X axis is flow rate.
I am trying to figure out the following functions:
pressure = func (flowrate, rpm)
flowrate = func (pressure, rpm)
rpm = func (flowrate, pressure)
Usually the pump curves follow some form of affinity laws, as
- flow rate is proportional to rpm.
- pressure is proportional to square of rpm.
However in reality - they do not, and there will be a difference between experimental data & theoretical data. So that is why I am looking to fit a function that looks like this: (open to other suggestions) - like 3rd degree polynomial as well.
Y = a*x1**2 + b*x1+ c*x2**2 + d*x2 + e*x1*x2 + f
However, I am getting the wrong results with scipy - curve_fit. I am not in favor of employing sklearn regression as we would not be able to figure out the coefficients.
Is there a way one can employ this in Python?
See my code below so far:
from numpy import genfromtxt
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import pandas as pd
# df = pd.read_excel('./nominal_data/pump_curve.xlsx', sheet_name =0)
df = pd.DataFrame({
'Flow(lpm)': [128.0846942, 105.7579874, 95.11146262, 69.57902038, 53.25344504, 35.47260492, np.nan, 131.96, 110.57, 91.32, 73.02, 53.9, 20.41, np.nan, 116.06, 99.46, 83.7, 68.84, 54.47, 20.98, np.nan, 103.0, 87.6, 73.6, 57.8, 44.0, 19.49, np.nan, 86.2, 73.1, 56.0, 42.6, 28.1, 16.33, np.nan, 56.2, 47.1, 38.6, 30.9, 24.0, 12.3],
'Speed Feedback (RPM)': [5204.0, 5208.0, 5206.0, 5206.0, 5176.0, 5175.0, np.nan, 4710.72, 4706.4, 4714.93, 4687.11, 4691.0, 4602.0, np.nan, 4103.21, 4115.26, 4147.8, 4148.14, 4141.09, 4124.72, np.nan, 3675.89, 3657.88, 3673.73, 3671.41, 3675.27, 3664.88, np.nan, 3118.66, 3186.23, 3106.92, 3107.19, 3114.69, 3090.08, np.nan, 2077.44, 2073.23, 2062.01, 2069.37, 2068.02, 2067.91],
'dP (PSI)': [16.5, 25.34, 28.78, 35.45, 37.86, 38.87, np.nan, 8.85, 17.01, 23.42, 27.48, 30.5, 32.4, np.nan, 6.69, 11.84, 17.24, 20.16, 22.64, 25.81, np.nan, 5.2, 9.6, 13.2, 16.3, 18.1, 20.38, np.nan, 3.7, 6.5, 10.0, 12.1, 13.5, 14.54, np.nan, 1.2, 2.7, 3.7, 4.7, 5.2, 6.3]
})
flow_lpm = df['Flow(lpm)'].to_numpy()
rpm = df['Speed Feedback (RPM)'].to_numpy()
dP = df['dP (PSI)'].to_numpy()
X = (flow_lpm, rpm)
y = dP
def f(X, a, b, c, d, e, f):
x_1, x_2 = X
return a*x_1**2 + b*x_1 + c*x_2**2 + d*x_2 + e*x_1*x_2 + f
param, param_cov = curve_fit(f, X, y)
# Test a random data point.
flow_test = 54
rpm_test = 5195
dp_test = param[0]*flow_test**2 + param[1]*flow_test + param[2]*rpm_test**2 + param[3]**rpm_test + param[4]*flow_test*rpm_test + param[5]
print (dp_test)
I could have multiple curves of Pressure = function (flow rate) at multiple RPMs like the figure shows. For example there is a 3rd degree curve for a 5100 RPM curve. However it only works for a given RPM. The curve is different for a 2070 RPM situation.
Seeking suggestions on proper curve fitting methodology. Open to deleting end points, to ensure the curve fit works in the middle range.
datafile_imagefile
Share Improve this question edited Mar 7 at 17:07 Nick ODell 25.8k7 gold badges46 silver badges88 bronze badges asked Mar 6 at 22:15 Jesh KundemJesh Kundem 9833 gold badges18 silver badges32 bronze badges 3 |2 Answers
Reset to default 2With the following dataset you provided:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import optimize
data = pd.DataFrame({
'Flow(lpm)': [128.0846942, 105.7579874, 95.11146262, 69.57902038, 53.25344504, 35.47260492, np.nan, 131.96, 110.57, 91.32, 73.02, 53.9, 20.41, np.nan, 116.06, 99.46, 83.7, 68.84, 54.47, 20.98, np.nan, 103.0, 87.6, 73.6, 57.8, 44.0, 19.49, np.nan, 86.2, 73.1, 56.0, 42.6, 28.1, 16.33, np.nan, 56.2, 47.1, 38.6, 30.9, 24.0, 12.3],
'Speed Feedback (RPM)': [5204.0, 5208.0, 5206.0, 5206.0, 5176.0, 5175.0, np.nan, 4710.72, 4706.4, 4714.93, 4687.11, 4691.0, 4602.0, np.nan, 4103.21, 4115.26, 4147.8, 4148.14, 4141.09, 4124.72, np.nan, 3675.89, 3657.88, 3673.73, 3671.41, 3675.27, 3664.88, np.nan, 3118.66, 3186.23, 3106.92, 3107.19, 3114.69, 3090.08, np.nan, 2077.44, 2073.23, 2062.01, 2069.37, 2068.02, 2067.91],
'dP (PSI)': [16.5, 25.34, 28.78, 35.45, 37.86, 38.87, np.nan, 8.85, 17.01, 23.42, 27.48, 30.5, 32.4, np.nan, 6.69, 11.84, 17.24, 20.16, 22.64, 25.81, np.nan, 5.2, 9.6, 13.2, 16.3, 18.1, 20.38, np.nan, 3.7, 6.5, 10.0, 12.1, 13.5, 14.54, np.nan, 1.2, 2.7, 3.7, 4.7, 5.2, 6.3]
})
Remove nan
s:
data = data.dropna()
Define your model as follow:
def model(x, a, b, c, d, e, f):
return a * x[:,0] ** 2 + b * x[:,0] + c * x[:,1] ** 2 + d * x[:,1] + e * x[:,0] * x[:,1] + f
Then optimize:
popt, pcov = optimize.curve_fit(model, data.iloc[:,:2].values, data.iloc[:,2].values)
Now we can regress the model on a finer grid:
x0lin = np.linspace(data.iloc[:,0].min(), data.iloc[:,0].max(), 200)
x1lin = np.linspace(data.iloc[:,1].min(), data.iloc[:,1].max(), 200)
X0, X1 = np.meshgrid(x0lin, x1lin)
Z = model(np.array([X0.ravel(), X1.ravel()]).T, *popt).reshape(X0.shape)
It renders as follow:
fig, axe = plt.subplots(subplot_kw={"projection": "3d"})
axe.scatter(*data.values.T)
axe.plot_surface(X0, X1, Z, cmap="viridis", alpha=0.35)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
df = pd.read_excel('pump_curve.xlsx', sheet_name =0)
columns = ['Flow(lpm)', 'Speed Feedback (RPM)', 'dP (PSI)']
df = df[columns].dropna()
columns2 = ['lpm', 'rpm', 'psi']
df = df.rename(columns=dict(zip(columns, columns2)))
def f(X, a, b, c, d, e, f):
x_1, x_2 = X
return a*x_1**2 + b*x_1 + c*x_2**2 + d*x_2 + e*x_1*x_2 + f
fig, ax = plt.subplots(1, 3, figsize=(15, 5))
for idx, name in enumerate(columns2):
y = df[name].to_numpy()
name1, name2 = columns2[:idx]+columns2[idx+1:]
X = (df[name1].to_numpy(), df[name2].to_numpy())
param, param_cov = curve_fit(f, X, y)
ax[idx].plot(f(X, *param), label='prediction');
ax[idx].plot(y, label='data');
ax[idx].legend();
ax[idx].set_title(f'{name} = f({name1}, {name2})');
plt.show();
df.to_dict(orient='list')
. This is makes the post more useful to future users in case the link breaks at some point. – Nick ODell Commented Mar 7 at 17:09