The most popular DataFrame manipulation in Pandas is filtering. In this post, we’ll look at how to use several conditions to filter a Pandas DataFrame. In Pandas, there are multiple methods to extract data from the DataFrame using multiple conditions. In the following examples, we will demonstrate how to use different functions to filter DataFrame using multiple conditions.
Method 1: Using eval()
eval() is used to evaluate an expression. So it will act as a filter in the DataFrame and return the rows that match the condition.
Syntax
Example 1
Let’s create a DataFrame with 6 columns and 4 rows and return the rows based on the fee column where the id is greater than 20, and the name ends with “n”.
import numpy
remarks = pandas.DataFrame([[23,'sravan','pass',1000,34,56],
[21,'siva','fail',400,32,45],
[20,'sahaja','pass',100,78,90],
[22,'suryam','fail',450,76,56]
],columns=['id','name','status','fee','points1','points2'])
print(remarks)
print()
# Return the rows based on the fee column where the id is greater than 20 and the name ends with “n”.
print(remarks[remarks.eval(“id > 20 & name.str.endswith(‘n’).values”)])
Output
0 23 sravan pass 1000 34 56
1 21 siva fail 400 32 45
2 20 sahaja pass 100 78 90
3 22 suryam fail 450 76 56
id name status fee points1 points2
0 23 sravan pass 1000 34 56
There is only one row such that name ends with ‘n’, and the id is greater than 20. Here, we specified two conditions using the “and” operator.
Example 2
Return the rows based on the ‘fee’ column where id is greater than 20 and ‘points1’ is less than 35, and the name starts with ‘s’.
import numpy
remarks = pandas.DataFrame([[23,'sravan','pass',1000,34,56],
[21,'siva','fail',400,32,45],
[20,'sahaja','pass',100,78,90],
[22,'suryam','fail',450,76,56]
],columns=['id','name','status','fee','points1','points2'])
# Return the rows based on the fee column where id is greater than 20 and starts with "s" and points1 less than 35.
print(remarks[remarks.eval("id > 20 & name.str.startswith('s').values & points1 < 35")])
Output
0 23 sravan pass 1000 34 56
1 21 siva fail 400 32 45
Two rows match the condition.
Method 2: Using loc[]
Syntax
Parameter
Index label: List of strings or a single string of the row’s index names.
Example 1
Create a DataFrame named ‘remarks’ with 6 columns. Let’s return the rows based on the fee column where fee is greater than 300 and points2 less than 76.
remarks = pandas.DataFrame([[23,'sravan','pass',1000,34,56],
[21,'siva','fail',400,32,45],
[20,'sahaja','pass',100,78,90],
[22,'suryam','fail',450,76,56]
],columns=['id','name','status','fee','points1','points2'])
# Display the DataFrame - remarks
print(remarks)
print()
# Return the rows based on the fee column where fee is greater than 300 and points2 less than 76
print(remarks.loc[(remarks['fee'] > 300) & (remarks['points2'] < 76)])
Output
0 23 sravan pass 1000 34 56
1 21 siva fail 400 32 45
2 20 sahaja pass 100 78 90
3 22 suryam fail 450 76 56
id name status fee points1 points2
0 23 sravan pass 1000 34 56
1 21 siva fail 400 32 45
3 22 suryam fail 450 76 56
There are 3 rows where the fee is greater than 300 and points2 less than 76. Here, we specified two conditions with the ‘&’ operator.
Example 2:
Create a DataFrame named ‘remarks’ with 6 columns. Let’s return the rows based on the fee column where fee is greater than 300 and points2 less than 76.
remarks = pandas.DataFrame([[23,'sravan','pass',1000,34,56],
[21,'siva','fail',400,32,45],
[20,'sahaja','pass',100,78,90],
[22,'suryam','fail',450,76,56]
],columns=['id','name','status','fee','points1','points2'])
# Return the rows based on the fee column where fee is greater than 300 and points2 less than 76, and the status is 'fail'.
print(remarks.loc[(remarks['fee'] > 300) & (remarks['points2'] < 76) & (remarks['status'] == 'fail')])
Output
1 21 siva fail 400 32 45
3 22 suryam fail 450 76 56
There are 2 rows where the fee is greater than 300 and points2 greater than 76, and the status is ‘fail’. Here, we specified three conditions with the ‘&’ operator.
Method 3: Using query()
query() will take the condition as an expression such that rows are filtered in the DataFrame based on the expression provided. Make sure that you need to write an expression inside “ ”.
Syntax
Example
Let’s return the rows based on the fee column where fee is greater than 300 and points2 less than 76.
remarks = pandas.DataFrame([[23,'sravan','pass',1000,34,56],
[21,'siva','fail',400,32,45],
[20,'sahaja','pass',100,78,90],
[22,'suryam','fail',450,76,56]
],columns=['id','name','status','fee','points1','points2'])
# Return the rows based on the fee column where fee is greater than 300 and points2 less than 76
print(remarks.query("fee>300 and points2 < 76"))
Output
0 23 sravan pass 1000 34 56
1 21 siva fail 400 32 45
3 22 suryam fail 450 76 56
There are 3 rows where the fee is greater than 300 and points2 less than 76. Here, we specified two conditions using the ‘and’ operator.
Conclusion
Filtering is the most often used DataFrame operation in Pandas. In this guide, we deliberated how you filter DataFrame by using multiple conditions. After covering this article, you may be able to filter the data by using multiple conditions yourself. We implemented a few examples in this article to teach you how to extract data from the DataFrame with the help of multiple conditions using the different functions in Pandas and NumPy like loc[], query(), and eval().
from https://ift.tt/FRTAPgz
0 Comments