I have a linear regression asset in dagster that uses data previously computed and sklearn LinearRegression (Python 10 here).
For each of my input columns (that represents a country, I want to fit a Linear Regression model.
Everything works fine. My question is about outputing these models (or use dagster metadata?)
I want basically a train asset and a forecast asset, a for this I want to return the models trained in the train asset and load them in the forecast asset. Solution could be to save them locally but I want to use dagster exclusively.
Also, I would like to save plenty of metadata (score, rmse) of each model into the train asset metadata.
Here is my code:
@asset(deps=[])
def train_linear_regression(duckdb: DuckDBResource):
"""Use pivot table with time serie data to forecast.
Used Linear Regression.
"""
# Setting up query.
query = "SELECT * FROM pivot_table_model"
# Execute the query.
with duckdb.get_connection() as conn:
df = conn.execute(query).df()
output = {}
for country_name in df.drop(columns=["year"]).columns:
# Setting Y.
Y = df.loc[:, country_name] # Retrieving population - pd.Series.
# Preparing linear model.
linear_regression = LinearRegression()
# Fitting the model.
linear_regression.fit(X, Y)
# Scoring.
score = linear_regression.score(X, Y)
output[country_name] = {
"model": linear_regression,
"score": float(score),
"plot": generate_plot(df),
}
What should I return?