最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sci

旗下网站admin12浏览0评论

sci

sci

本文介绍了sci-kit学习管道返回indexError:数组的索引过多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在尝试通过一些简单的机器学习项目来使用sci-kit学习,但是我对管道一无所知,想知道我做错了什么...

I'm trying to get to grips with sci-kit learn for some simple machine learning projects but I'm coming unstuck with Pipelines and wonder what I've done wrong...

我正在尝试通过Kaggle上的教程

I'm trying to work through a tutorial on Kaggle

这是我的代码:

import pandas as pdtrain = pd.read_csv(local path to training data)train_labels = pd.read_csv(local path to labels)from sklearn.decomposition import PCAfrom sklearn.svm import LinearSVCfrom sklearn.grid_search import GridSearchCVpca = PCA()clf = LinearSVC()n_components = arange(1, 39)loss =['l1','l2']penalty =['l1','l2']C = arange(0, 1, .1)whiten = [True, False]from sklearn.pipeline import Pipeline#set up pipelinepipe = Pipeline(steps=[('pca', pca), ('clf', clf)])#set up GridsearchCVestimator = GridSearchCV(pipe, dict(pca__n_components = n_components, pca__whiten = whiten, clf__loss = loss, clf__penalty = penalty, clf__C = C))estimator

返回:

GridSearchCV(cv=None, estimator=Pipeline(steps=[('pca', PCA(copy=True, n_components=None, whiten=False)), ('clf', LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True, intercept_scaling=1, loss='l2', multi_class='ovr', penalty='l2', random_state=None, tol=0.0001, verbose=0))]), fit_params={}, iid=True, loss_func=None, n_jobs=1, param_grid={'clf__penalty': ['l1', 'l2'], 'clf__loss': ['l1', 'l2'], 'clf__C': array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]), 'pca__n_components': array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]), 'pca__whiten': [True, False]}, pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None, verbose=0)

但是当我尝试训练数据时:

But when I try to train data:

estimator.fit(train, train_labels)

错误是:

428 for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)): 429 for label, (_, test_split) in zip(unique_labels, per_label_splits):--> 430 label_test_folds = test_folds[y == label] 431 # the test split can be too big because we used 432 # KFold(max(c, self.n_folds), self.n_folds) instead ofIndexError: too many indices for array

有人能指出我正确的方向吗?

Can anyone point me in the right direction?

推荐答案

事实证明Pandas数据框的形状错误.

It turns out that the Pandas dataframe is the wrong shape.

estimator.fit(train.values, train_labels[0].values)

有效,尽管我也不得不放弃惩罚条款.

works, although I also had to drop the penalty term.

sci

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论