Custom Scikit-learn Model Persistence

I know there are several out of the box methods of saving the model.

However I want to save the model as a database blob.

I've seen examples where people extract 'coefs' etc from the dict of the estimator but when I try this with RandomForrestRegressor (for example) it says there are no coefs.

I've seen yet other examples that claim all estimators have a 'save' method somewhere but I can't get that to work.

Is there any way of getting the model data from the fitted estimator in a way that will work universally for all estimators?

I would then base64 the model data and persist it to my Database for later use. There seems little documentation on this aside from just 'use pickle or joblib' etc.

I know there are several out of the box methods of saving the model.

However I want to save the model as a database blob.

I've seen examples where people extract 'coefs' etc from the dict of the estimator but when I try this with RandomForrestRegressor (for example) it says there are no coefs.

I've seen yet other examples that claim all estimators have a 'save' method somewhere but I can't get that to work.

Is there any way of getting the model data from the fitted estimator in a way that will work universally for all estimators?

I would then base64 the model data and persist it to my Database for later use. There seems little documentation on this aside from just 'use pickle or joblib' etc.

Share Improve this question edited Nov 20, 2024 at 22:55 desertnaut 60.4k32 gold badges155 silver badges181 bronze badges asked Nov 20, 2024 at 18:50 Richard 1,13010 silver badges23 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

This is a good summary of some methods of serializing sklearn models.

pickle works on most python objects and I see that it works fine for sklearn models too. For example

def model_to_base64(model):
    import pickle, base64
    return base64.b64encode(pickle.dumps(model)).decode('utf-8')

def base64_to_model(encoded_model):
    import pickle, base64
    return pickle.loads(base64.b64decode(encoded_model))

Tested with several kinds of estimators

from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.svm import SVC, SVR
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier

models = [
    RandomForestClassifier(n_estimators=10, random_state=42).fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]),
    RandomForestRegressor(n_estimators=10, random_state=42).fit([[1, 2], [3, 4], [5, 6]], [10, 20, 30]),
    LogisticRegression().fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]),
    LinearRegression().fit([[1, 2], [3, 4], [5, 6]], [10, 20, 30]),
    SVC(probability=True).fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]),
    SVR().fit([[1, 2], [3, 4], [5, 6]], [10, 20, 30]),
    KMeans(n_clusters=2, random_state=42).fit([[1, 2], [3, 4], [5, 6]]),
    DecisionTreeClassifier().fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]),
    Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', LogisticRegression())
    ]).fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]),
]

for i, model in enumerate(models):
    encoded_model = model_to_base64(model)
    print(f"Model {i + 1} Serialized: {encoded_model[:100]}...")
    
    restored_model = base64_to_model(encoded_model)
    print(f"Model {i + 1} Restored: {restored_model.__class__.__name__}")
    
    assert isinstance(restored_model, model.__class__)

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

Custom Scikit-learn Model Persistence - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)