I know there are several out of the box methods of saving the model.
However I want to save the model as a database blob.
I've seen examples where people extract 'coefs' etc from the dict of the estimator but when I try this with RandomForrestRegressor (for example) it says there are no coefs.
I've seen yet other examples that claim all estimators have a 'save' method somewhere but I can't get that to work.
Is there any way of getting the model data from the fitted estimator in a way that will work universally for all estimators?
I would then base64 the model data and persist it to my Database for later use. There seems little documentation on this aside from just 'use pickle or joblib' etc.
I know there are several out of the box methods of saving the model.
However I want to save the model as a database blob.
I've seen examples where people extract 'coefs' etc from the dict of the estimator but when I try this with RandomForrestRegressor (for example) it says there are no coefs.
I've seen yet other examples that claim all estimators have a 'save' method somewhere but I can't get that to work.
Is there any way of getting the model data from the fitted estimator in a way that will work universally for all estimators?
I would then base64 the model data and persist it to my Database for later use. There seems little documentation on this aside from just 'use pickle or joblib' etc.
Share Improve this question edited Nov 20, 2024 at 22:55 desertnaut 60.4k32 gold badges155 silver badges181 bronze badges asked Nov 20, 2024 at 18:50 RichardRichard 1,13010 silver badges23 bronze badges1 Answer
Reset to default 0This is a good summary of some methods of serializing sklearn models.
pickle
works on most python objects and I see that it works fine for sklearn models too. For example
def model_to_base64(model):
import pickle, base64
return base64.b64encode(pickle.dumps(model)).decode('utf-8')
def base64_to_model(encoded_model):
import pickle, base64
return pickle.loads(base64.b64decode(encoded_model))
Tested with several kinds of estimators
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.svm import SVC, SVR
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
models = [
RandomForestClassifier(n_estimators=10, random_state=42).fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]),
RandomForestRegressor(n_estimators=10, random_state=42).fit([[1, 2], [3, 4], [5, 6]], [10, 20, 30]),
LogisticRegression().fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]),
LinearRegression().fit([[1, 2], [3, 4], [5, 6]], [10, 20, 30]),
SVC(probability=True).fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]),
SVR().fit([[1, 2], [3, 4], [5, 6]], [10, 20, 30]),
KMeans(n_clusters=2, random_state=42).fit([[1, 2], [3, 4], [5, 6]]),
DecisionTreeClassifier().fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]),
Pipeline([
('scaler', StandardScaler()),
('classifier', LogisticRegression())
]).fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]),
]
for i, model in enumerate(models):
encoded_model = model_to_base64(model)
print(f"Model {i + 1} Serialized: {encoded_model[:100]}...")
restored_model = base64_to_model(encoded_model)
print(f"Model {i + 1} Restored: {restored_model.__class__.__name__}")
assert isinstance(restored_model, model.__class__)