最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Custom Scikit-learn Model Persistence - Stack Overflow

programmeradmin1浏览0评论

I know there are several out of the box methods of saving the model.

However I want to save the model as a database blob.

I've seen examples where people extract 'coefs' etc from the dict of the estimator but when I try this with RandomForrestRegressor (for example) it says there are no coefs.

I've seen yet other examples that claim all estimators have a 'save' method somewhere but I can't get that to work.

Is there any way of getting the model data from the fitted estimator in a way that will work universally for all estimators?

I would then base64 the model data and persist it to my Database for later use. There seems little documentation on this aside from just 'use pickle or joblib' etc.

I know there are several out of the box methods of saving the model.

However I want to save the model as a database blob.

I've seen examples where people extract 'coefs' etc from the dict of the estimator but when I try this with RandomForrestRegressor (for example) it says there are no coefs.

I've seen yet other examples that claim all estimators have a 'save' method somewhere but I can't get that to work.

Is there any way of getting the model data from the fitted estimator in a way that will work universally for all estimators?

I would then base64 the model data and persist it to my Database for later use. There seems little documentation on this aside from just 'use pickle or joblib' etc.

Share Improve this question edited Nov 20, 2024 at 22:55 desertnaut 60.4k32 gold badges155 silver badges181 bronze badges asked Nov 20, 2024 at 18:50 RichardRichard 1,13010 silver badges23 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

This is a good summary of some methods of serializing sklearn models.

pickle works on most python objects and I see that it works fine for sklearn models too. For example

def model_to_base64(model):
    import pickle, base64
    return base64.b64encode(pickle.dumps(model)).decode('utf-8')

def base64_to_model(encoded_model):
    import pickle, base64
    return pickle.loads(base64.b64decode(encoded_model))

Tested with several kinds of estimators

from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.svm import SVC, SVR
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier

models = [
    RandomForestClassifier(n_estimators=10, random_state=42).fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]),
    RandomForestRegressor(n_estimators=10, random_state=42).fit([[1, 2], [3, 4], [5, 6]], [10, 20, 30]),
    LogisticRegression().fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]),
    LinearRegression().fit([[1, 2], [3, 4], [5, 6]], [10, 20, 30]),
    SVC(probability=True).fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]),
    SVR().fit([[1, 2], [3, 4], [5, 6]], [10, 20, 30]),
    KMeans(n_clusters=2, random_state=42).fit([[1, 2], [3, 4], [5, 6]]),
    DecisionTreeClassifier().fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]),
    Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', LogisticRegression())
    ]).fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0]),
]

for i, model in enumerate(models):
    encoded_model = model_to_base64(model)
    print(f"Model {i + 1} Serialized: {encoded_model[:100]}...")
    
    restored_model = base64_to_model(encoded_model)
    print(f"Model {i + 1} Restored: {restored_model.__class__.__name__}")
    
    assert isinstance(restored_model, model.__class__)

发布评论

评论列表(0)

  1. 暂无评论