python - Pandas indexing - Stack Overflow

Can someone explain what is meant by

Both loc and iloc [in Pandas] are row-first, column-second. This is the opposite of what we do in native Python, which is column-first, row-second.

Because I thought when accessing arrays or lists of lists, the first index always represents the row:

matrix = [
    [1,2,3], # row 1, index 0
    [4,5,6], # row 2, index 1
    [7,8,9] # row 3, index 2
]
print(matrix[1][2]) # Output = 6

Can someone explain what is meant by

Both loc and iloc [in Pandas] are row-first, column-second. This is the opposite of what we do in native Python, which is column-first, row-second.

Because I thought when accessing arrays or lists of lists, the first index always represents the row:

matrix = [
    [1,2,3], # row 1, index 0
    [4,5,6], # row 2, index 1
    [7,8,9] # row 3, index 2
]
print(matrix[1][2]) # Output = 6

Share Improve this question edited 16 hours ago simon 5,2481 gold badge15 silver badges28 bronze badges asked 16 hours ago Obase Oyeni Ayomobi 191 bronze badge New contributor Obase Oyeni Ayomobi is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

Welcome to Stack Overflow! Where did you find that? You're right that both systems use row-first, column-second order – Proteus Commented 16 hours ago
@Proteus There have been several discussions around that sentence, googling it brings up, for example, this and this. The second link references to the "Indexing, Selecting & Assignment" notebook of the Pandas series, which can be found here and which still contains the sentence in question. – simon Commented 16 hours ago
I agree that this sentence makes no sense in the given context. Typically, you run into indexing order problems when working with images or volumes, where the layout of x,y,z axes does not necessarily match the i,j,k indexing order within the array. Thus, it depends on the context, what 'rows' and 'columns' are. For pandas, however, it is pretty clear and in-line with what you would expect from indexing in native python. – André Commented 16 hours ago
I think that the contentious statement refers to such as df['col_name'][0] which selects a Pandas Series and then first row element of that Series and LOOKS like [col][row] and seems Python-like indexing in not using loc or iloc. – user19077881 Commented 15 hours ago
@user19077881 But wouldn't that mean that the quote (while still being misleading/wrong) should be exactly the other way round, i.e. column-first, row-second in Pandas vs. row-first, column-second in native Python? – simon Commented 12 hours ago

Add a comment |

2 Answers 2

Sorted by: Reset to default 1

This means that if you pass two coordinates, the first one indexes the rows, and the second the columns.

Assuming this example:

df.iloc[0, 2]  # first (0) row, third (2) column
# 2

You would actually get the same in pure python:

lst = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]

lst[0][2] # 2

The side effect is that, with a single coordinate you default to the rows:

df.iloc[0] # first row

And if you want only a column, you should explicitly request all rows with ::

df.iloc[:, 2] # all rows, third column

I would say that statement is incorrect or, at least, very misleading and likely to cause confusion.

Both iloc and loc are row-first & column-second, but this is exactly the same as how indexing works in native Python and your example. First index refers to the row, and the second index refers to the column.

Your example in pandas using iloc/loc also outputs 6:

import pandas as pd

data = [
    [1, 2, 3], # row 0
    [4, 5, 6], # row 1
    [7, 8, 9]  # row 2
]

df = pd.DataFrame(data)

print(df.iloc[1, 2])

# Output: 6

There has already been some discussion about this exact statement in this Kaggle discussion, but to me is still not clear to what the author was referring to.

As per Siraz Naorem understanding, the statement might be referring to the creation of DataFrames from column-oriented data, e.g. dictionaries, where each list or array represents a column, not a row.

If we replicate again your example but create the DataFrame from a dictionary like this:

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

print(df)
# Output:    
#    A  B  C
# 0  1  4  7
# 1  2  5  8
# 2  3  6  9

Now, when we access index [1,2], we do not get 6:

print(df.iloc[1, 2]) 
# Output: 8

print(df.iloc[2, 1]) 
# Output: 6

In this case, the row and column indices might seem reversed and may lead to the mistaken idea that indexing is different: iloc[1,2] give us now 8, and we have to use iloc[2,1] to get the value 6.

However, iloc/loc indexing itself has not changed, is still row-first & column-second, and what is different is the structure of the DataFrame, since pandas internally has treated each list in the dictionary as a column.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - Pandas indexing - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)