最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - numpy.cov() Giving Incorrect Answer - Stack Overflow

programmeradmin1浏览0评论

I'm trying to use Python and NumPy to calculate a covariance matrix.

Here is the matrix:

[[0.69, 0.49],
 [-1.31, -1.21],
 [0.39, 0.99],
 [0.09, 0.29],
 [1.29, 1.09],
 [0.49, 0.79],
 [0.19, -0.31],
 [-0.81, -0.81],
 [-0.31, -0.31],
 [-0.71, -1.01]]

Here is the expected result:
[[0.7322, 0.6189],

[0.6189, 0.5956]]

Here is the equation I was given: covariance matrix = (1 / (n - 1))ZTZ

n is the number of entries (10)

Z is the matrix

I tried np.cov(np_matrix), which didn't return the correct size or values.

I've also tried this:

np.cov(np_matrix, rowvar=False)

>>> array([[0.61655556, 0.61544444],
           [0.61544444, 0.71655556]]

I'm also trying to calculate it manually instead of using the cov function, but multiplying by the transpose doesn't even return the correct value.
The correct value is this:
[[6.59, 5.57],
[5.57, 5.36]]

However, this is what my code returns:

np_matrix.T @ np_matrix

>>> array([[5.549, 5.539],
           [5.539, 6.449]])
np_matrix.T * np_matrix

>>> ValueError: operands could not be broadcast together with shapes (2,10) (10,2) 

I've also tried setting the arrays to floats and doubles.

I'm trying to use Python and NumPy to calculate a covariance matrix.

Here is the matrix:

[[0.69, 0.49],
 [-1.31, -1.21],
 [0.39, 0.99],
 [0.09, 0.29],
 [1.29, 1.09],
 [0.49, 0.79],
 [0.19, -0.31],
 [-0.81, -0.81],
 [-0.31, -0.31],
 [-0.71, -1.01]]

Here is the expected result:
[[0.7322, 0.6189],

[0.6189, 0.5956]]

Here is the equation I was given: covariance matrix = (1 / (n - 1))ZTZ

n is the number of entries (10)

Z is the matrix

I tried np.cov(np_matrix), which didn't return the correct size or values.

I've also tried this:

np.cov(np_matrix, rowvar=False)

>>> array([[0.61655556, 0.61544444],
           [0.61544444, 0.71655556]]

I'm also trying to calculate it manually instead of using the cov function, but multiplying by the transpose doesn't even return the correct value.
The correct value is this:
[[6.59, 5.57],
[5.57, 5.36]]

However, this is what my code returns:

np_matrix.T @ np_matrix

>>> array([[5.549, 5.539],
           [5.539, 6.449]])
np_matrix.T * np_matrix

>>> ValueError: operands could not be broadcast together with shapes (2,10) (10,2) 

I've also tried setting the arrays to floats and doubles.

Share Improve this question edited Mar 13 at 21:39 Ben Grossmann 4,8771 gold badge13 silver badges19 bronze badges asked Mar 13 at 21:17 marbledcrystalsmarbledcrystals 12 bronze badges 5
  • Where did you get that formula? That's the wrong formula. NumPy is correct. – user2357112 Commented Mar 13 at 21:40
  • 1 Applying your formula np_matrix.T @ np_matrix/(np_matrix.shape[0] - 1) yields the same result as the np.cov function (with row_var=False) – Ben Grossmann Commented Mar 13 at 21:45
  • @BenGrossmann: The formula doesn't look right to me. There should be a step where it compensates for the mean. – user2357112 Commented Mar 13 at 21:46
  • 1 @user2357112 That was a bit careless of me. I assumed that we already had a matrix with the mean subtracted out (which is indeed true in this case) – Ben Grossmann Commented Mar 13 at 21:48
  • @BenGrossmann: Looking at the mean of the data, that does look like the case. And you're right about the formula agreeing with NumPy's output, rather than the "expected" result. – user2357112 Commented Mar 13 at 21:49
Add a comment  | 

1 Answer 1

Reset to default 0

The formula you have been provided applies only if your matrix Z is mean-centred, i.e. Z=X−μ. Here is the proof:

import numpy as np

X = np.array([
    [0.69, 0.49],
    [-1.31, -1.21],
    [0.39, 0.99],
    [0.09, 0.29],
    [1.29, 1.09],
    [0.49, 0.79],
    [0.19, -0.31],
    [-0.81, -0.81],
    [-0.31, -0.31],
    [-0.71, -1.01]
])

# Compute the mean
mean_X = np.mean(X, axis=0)

# Mean-center the data
Z = X - mean_X

n = X.shape[0]  # Number of samples (rows)
cov = (1 / (n - 1)) * (Z.T @ Z)
print(cov)

Result:

[[0.61655556 0.61544444]
 [0.61544444 0.71655556]]

The numpy way:

print(np.cov(X, rowvar=False))

Result:

[[0.61655556 0.61544444]
 [0.61544444 0.71655556]]
发布评论

评论列表(0)

  1. 暂无评论