Can someone explain what is meant by
Both
loc
andiloc
[in Pandas] are row-first, column-second. This is the opposite of what we do in native Python, which is column-first, row-second.
Because I thought when accessing arrays or lists of lists, the first index always represents the row:
matrix = [
[1,2,3], # row 1, index 0
[4,5,6], # row 2, index 1
[7,8,9] # row 3, index 2
]
print(matrix[1][2]) # Output = 6
Can someone explain what is meant by
Both
loc
andiloc
[in Pandas] are row-first, column-second. This is the opposite of what we do in native Python, which is column-first, row-second.
Because I thought when accessing arrays or lists of lists, the first index always represents the row:
matrix = [
[1,2,3], # row 1, index 0
[4,5,6], # row 2, index 1
[7,8,9] # row 3, index 2
]
print(matrix[1][2]) # Output = 6
Share
Improve this question
edited 16 hours ago
simon
5,2481 gold badge15 silver badges28 bronze badges
asked 16 hours ago
Obase Oyeni AyomobiObase Oyeni Ayomobi
191 bronze badge
New contributor
Obase Oyeni Ayomobi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
5
|
2 Answers
Reset to default 1This means that if you pass two coordinates, the first one indexes the rows, and the second the columns.
Assuming this example:
0 1 2
0 0 1 2
1 3 4 5
2 6 7 8
df.iloc[0, 2] # first (0) row, third (2) column
# 2
You would actually get the same in pure python:
lst = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
lst[0][2] # 2
The side effect is that, with a single coordinate you default to the rows:
df.iloc[0] # first row
And if you want only a column, you should explicitly request all rows with :
:
df.iloc[:, 2] # all rows, third column
I would say that statement is incorrect or, at least, very misleading and likely to cause confusion.
Both iloc
and loc
are row-first & column-second, but this is exactly the same as how indexing works in native Python and your example. First index refers to the row, and the second index refers to the column.
Your example in pandas using iloc/loc
also outputs 6:
import pandas as pd
data = [
[1, 2, 3], # row 0
[4, 5, 6], # row 1
[7, 8, 9] # row 2
]
df = pd.DataFrame(data)
print(df.iloc[1, 2])
# Output: 6
There has already been some discussion about this exact statement in this Kaggle discussion, but to me is still not clear to what the author was referring to.
As per Siraz Naorem understanding, the statement might be referring to the creation of DataFrames from column-oriented data, e.g. dictionaries, where each list or array represents a column, not a row.
If we replicate again your example but create the DataFrame from a dictionary like this:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
print(df)
# Output:
# A B C
# 0 1 4 7
# 1 2 5 8
# 2 3 6 9
Now, when we access index [1,2]
, we do not get 6:
print(df.iloc[1, 2])
# Output: 8
print(df.iloc[2, 1])
# Output: 6
In this case, the row and column indices might seem reversed and may lead to the mistaken idea that indexing is different: iloc[1,2]
give us now 8, and we have to use iloc[2,1]
to get the value 6.
However, iloc/loc
indexing itself has not changed, is still row-first & column-second, and what is different is the structure of the DataFrame, since pandas internally has treated each list in the dictionary as a column.
df['col_name'][0]
which selects a Pandas Series and then first row element of that Series and LOOKS like [col][row] and seems Python-like indexing in not usingloc
oriloc
. – user19077881 Commented 15 hours ago