How can I iterate over rows in a Pandas DataFrame?


When you are working with the pandas, you must have come across a need for you to process each row in a DataFrame. Although pandas are designed for faster and vectorized operations, row-wise iteration is also important in particular scenarios. Let’s dive into this blog to learn about various methods through which we can iterate over DataFrames and when to use each of them.

Table of Contents

Method 1: Using iterrows() – For smaller datasets

The iterrows() method allows you to loop through each row in a (index,Series) pair. Even Though it is simple to use, it is slower for larger datasets. It is best to be used for small datasets which require quick operations.

Let us now understand this with the help of an example:

Imagine you are grading a small list of students based on their marks.

import pandas as pd
test_data = {'Name': ['Eva', 'Bobby', 'Charles'], 'Score': [85, 62, 90]}
df = pd.DataFrame(test_data)
for index, row in df.iterrows():
    test_grade = 'A' if row['Score'] >= 80 else 'B'
    print(f"{row['Name']} scored {row['Score']} and got grade {test_grade}.")

Output:

Eva scored 85 and got grade A.  
Bobby scored 62 and got grade B.  
Charles scored 90 and got grade A.

Method 2: Using itertuples() – For larger datasets

itertuples() returns the values as namedtuples, which makes it faster and memory-efficient compared to that of iterrows(). It is best to be used for larger datasets where performance is important.

Now, if we take an example where you are required to calculate the salary of your employees in a larger dataset:

test_data = {'Employee': ['Harry', 'Hermione', 'Ron'], 'Monthly Salary': [3000, 4000, 3500]}
df = pd.DataFrame(test_data)
for row in df.itertuples():
    annual_salary = row._2 * 12
    print(f"{row.Employee} earns {annual_salary} annually.")

Output:

Harry earns 36000 annually.  
Hermione earns 48000 annually.  
Ron earns 42000 annually.

Method 3: Using apply() – For complex row-wise transformations

The apply() enables you to apply a function to each of the rows and columns. It is mostly useful or is ideal if you want to perform concise and vectorized row-wise calculations.It is mostly used for complex or mathematical operations across rows and columns.

Now, we can take an example where you want to calculate the Body Mass Index(BMI) for a group of people.

test_data = {'Name': ['Eva', 'Bobby'], 'Weight (kg)': [70, 85], 'Height (m)': [1.75, 1.80]}
df = pd.DataFrame(test_data)

df['BMI'] = df.apply(lambda row: row['Weight (kg)'] / (row['Height (m)'] ** 2), axis=1)
print(df)

Output:

Name Weight (kg)  Height (m) BMI
0 Eva 70  1.75 22.857143 
1 Bobby 85   1.80 26.234568

Method 4: Index-based Iteration (iloc[] or loc[]) – For specific rows

iloc[] and loc[] give you precise indexing when you want to process or update the specific rows in a dataframe. It is very useful when you need control over rows to access, modify them, and apply conditional updates.

If we take an example where you are trying to flag transactions above a certain amount in a financial dataset.

test_data = {‘Transaction ID’: [101, 102, 103], ‘Amount’: [500, 1500, 750]}

test_data = {'Transaction ID': [101, 102, 103], 'Amount': [500, 1500, 750]}
df = pd.DataFrame(test_data)

for i in range(len(df)):
    if df.loc[i, 'Amount'] > 1000:
        df.loc[i, 'Flag'] = 'High'
    else:
        df.loc[i, 'Flag'] = 'Normal'
print(df)

Output:

Transaction ID Amount Flag
0 101 500 Normal
1 102 1500 High
2 103 750 Normal

Which Method: When to Use

Method Best For
iterrows() Smaller datasets or the ones that require quick exploratory tasks.
itertuples() It is best when you have larger datasets that require better performance.
apply() It is used when there is a requirement for complex row-wise transformations or vectorized logic.
iloc[]/loc[] This gives you precise control over particular rows with conditional logic.

Conclusion

In conclusion, while there are multiple ways to iterate over rows in a Pandas DataFrame, the choice depends on your task’s complexity and dataset size. For small datasets or custom logic, you can use iterrows() or apply(). For better performance on larger datasets, use vectorized operations or itertuples().

Method to Iterate Over Rows in Pandas Dataframe – FAQs

What is the best way to iterate over the rows of a panda DataFrame?

For smaller datasets, you can use iterrows() and for smaller datasets, or for all those datasets that are performance-critical, you can use itertuples().

How do you iterate over multiple rows in pandas?

If you want to iterate over multiple rows in pandas, you can use slicing with iloc[] or loc[] to iterate over a subset of rows.

Code:


for _, row in df.loc[0:5].iterrows():

print(row)

What is the alternative to loop in pandas?

Vectorized operations and methods such as apply() or transform() are the alternatives that are faster to explicit loops.

Is itertuples() faster than iterrows()?

Yes, itertuples() is faster as it avoids or does not convert every row to a series object.



Leave a Reply

Your email address will not be published. Required fields are marked *