The weighted average is the average of the data that identifies the specific numbers that are more important than the other numbers in the DataFrame. We will be implementing all possible ways in which the Pandas weighted average can be calculated with the help of several examples.
Formula
Here, values_column is the numeric column in the Pandas DataFrame that stores the values, and weights_column is the numeric column that will store the weight of each value.
Method 1: Return Weighted Average
Let’s use the custom function that computes the weighted average of the Pandas DataFrame. We will use the sum() function to calculate the sum in the following computation:
Here, weight_data is the column in the DataFrame that holds weights for values in the value_data column.
Example
In this example, we have a DataFrame named ‘calculations’ with 2 columns of integer type. Now, we will create a custom function, ‘weighted_avg_calculation’, to calculate the weighted average and call the function with these two columns by passing them as arguments.
# Create the dataframe with 2 columns and 5 rows
calculations=pandas.DataFrame.from_dict({'count':[7,8,9,0,4],
'quantity':[2,3,4,5,2]
})
# Display the DataFrame - calculations
print(calculations)
# Custom function that calculates the weighted average
def weighted_avg_calculation(calculations,value_data,weight_data):
return sum(calculations[weight_data] * calculations[value_data])/calculations[weight_data].sum()
print()
# Call the function by passing the DataFrame, 'quantity' as value_data and 'count' as weight_data
print(weighted_avg_calculation(calculations,'quantity','count'))
Output
0 7 2
1 8 3
2 9 4
3 0 5
4 4 2
2.9285714285714284
Explanation
So, the custom function is:
It will return the weighted average.
So, the weighted average of the above DataFrame is 2.92.
Method 2: Return Weighted Average in Groups
Now, we will use the groupby() function to group the rows and return the weighted average in each group. The apply() method is used along with the groupby() that takes the weighted average and columns as parameters.
Here, rows were grouped based on values in the ‘grouping_column’. The weighted_avg_calculation is a custom function that computes the weighted average. The weight_data is the column in the DataFrame that holds weights for values in the value_data column.
Example
In this example, we have a DataFrame named ‘calculations’ with 3 columns. Now, we will create a custom function, ‘weighted_avg_calculation’, to calculate the weighted average and call the function with the two columns by passing them as arguments. We will group the rows based on the ‘item’ column and return the weighted average in each group.
# Create the dataframe with 3 columns and 5 rows
calculations=pandas.DataFrame.from_dict({'count':[12,34,56,10,15],
'quantity':[100,200,345,670,50],
'item':['plastic','iron','iron','steel','plastic']
})
# Display the DataFrame - calculations
print(calculations)
# Custom function that calculates the weighted average
def weighted_avg_calculation(calculations,value_data,weight_data):
return sum(calculations[weight_data] * calculations[value_data])/calculations[weight_data].sum()
print()
print(calculations.groupby('item').apply(weighted_avg_calculation,'quantity','count'))
Output
0 12 100 plastic
1 34 200 iron
2 56 345 iron
3 10 670 steel
4 15 50 plastic
item
iron 290.222222
plastic 72.222222
steel 670.000000
dtype: float64
Explanation
So, the custom function is:
It will return the weighted average.
There are three groups in the calculations DataFrame.
- The weighted average for the ‘iron’ group is 290.22
- The weighted average for the ‘plastic’ group is 72.22
- The weighted average for the ‘steel’ group is 670.00
Method 3: Return Weighted Average Using NumPy
NumPy module supports the average() function in which we can pass the values and weights to it and get the weighted average of the pandas DataFrame.
- In the first parameter, we need to pass the values column.
- In the second parameter, we will assign the ‘weight data’ column to weights.
numpy.average(DataFrame_object[‘value_data’],weights=DataFrame_object[‘weight_data’])
Example
In this example, we have a DataFrame named ‘calculations’ with 2 columns. We will directly use numpy.average() to calculate the weighted average.
import numpy
# Create the dataframe with 2 columns and 5 rows
calculations=pandas.DataFrame.from_dict({'count':[12,34,56,10,15],
'quantity':[100,200,345,670,50]
})
# Display the DataFrame - calculations
print(calculations)
print()
print(numpy.average(calculations['quantity'],weights=calculations['count']))
Output:
0 12 100
1 34 200
2 56 345
3 10 670
4 15 50
273.7795275590551
dtype: float64
Explanation
Here, the quantity column will be the value, and the count will be the weights.
The weighted average is 273.77.
Conclusion
The Pandas weighted average is a valuable and technical function. We have done the custom function of the Pandas weighted average and the NumPy Pandas weighted average. The average is something we need to calculate in almost everything, even the budgets of small groceries. Thus, when talking about the millions of data, the weighted average Pandas function is a treat for all the users working on the specific data average calculations in their fields.
from https://ift.tt/vdGA6Y2
0 Comments