Pandas¶
Tomas Beuzen, September 2020
These exercises complement Chapter 7.
Exercises¶
1.¶
In this set of practice exercises we’ll be investigating the carbon footprint of different foods. We’ll be leveraging a dataset compiled by Kasia Kulma and contributed to R’s Tidy Tuesday project.
Start by importing pandas with the alias pd
.
# Your answer here.
2.¶
The dataset we’ll be working with has the following columns:
column |
description |
---|---|
country |
Country Name |
food_category |
Food Category |
consumption |
Consumption (kg/person/year) |
co2_emmission |
Co2 Emission (Kg CO2/person/year) |
Import the dataset as a dataframe named df
from this url: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv
# Your answer here.
6.¶
How many different kinds of foods are there in the dataset? How many countries are in the dataset?
# Your answer here.
7.¶
What is the maximum co2_emmission
in the dataset and which food type and country does it belong to?
# Your answer here.
8.¶
How many countries produce more than 1000 Kg CO2/person/year for at least one food type?
# Your answer here.
11.¶
What is the total emissions of all the meat products (Pork, Poultry, Fish, Lamb & Goat, Beef) in the dataset combined?
# Your answer here.
12.¶
What is the total emissions of all other (non-meat) products in the dataset combined?
# Your answer here.
Solutions¶
1.¶
In this set of practice exercises we’ll be investigating the carbon footprint of different foods. We’ll be leveraging a dataset compiled by Kasia Kulma and contributed to R’s Tidy Tuesday project.
Start by importing pandas with the alias pd
.
import pandas as pd
2.¶
The dataset we’ll be working with has the following columns:
column |
description |
---|---|
country |
Country Name |
food_category |
Food Category |
consumption |
Consumption (kg/person/year) |
co2_emmission |
Co2 Emission (Kg CO2/person/year) |
Import the dataset as a dataframe named df
from this url: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv
url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv"
df = pd.read_csv(url)
df
country | food_category | consumption | co2_emmission | |
---|---|---|---|---|
0 | Argentina | Pork | 10.51 | 37.20 |
1 | Argentina | Poultry | 38.66 | 41.53 |
2 | Argentina | Beef | 55.48 | 1712.00 |
3 | Argentina | Lamb & Goat | 1.56 | 54.63 |
4 | Argentina | Fish | 4.36 | 6.96 |
... | ... | ... | ... | ... |
1425 | Bangladesh | Milk - inc. cheese | 21.91 | 31.21 |
1426 | Bangladesh | Wheat and Wheat Products | 17.47 | 3.33 |
1427 | Bangladesh | Rice | 171.73 | 219.76 |
1428 | Bangladesh | Soybeans | 0.61 | 0.27 |
1429 | Bangladesh | Nuts inc. Peanut Butter | 0.72 | 1.27 |
1430 rows × 4 columns
4.¶
What is the type of data in each column of df
?
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1430 entries, 0 to 1429
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 country 1430 non-null object
1 food_category 1430 non-null object
2 consumption 1430 non-null float64
3 co2_emmission 1430 non-null float64
dtypes: float64(2), object(2)
memory usage: 44.8+ KB
6.¶
How many different kinds of foods are there in the dataset? How many countries are in the dataset?
print(f"There are {df['food_category'].nunique()} foods.")
print(f"There are {df['country'].nunique()} countries.")
There are 11 foods.
There are 130 countries.
7.¶
What is the maximum co2_emmission
in the dataset and which food type and country does it belong to?
df.iloc[df['co2_emmission'].idxmax()]
country Argentina
food_category Beef
consumption 55.48
co2_emmission 1712
Name: 2, dtype: object
8.¶
How many countries produce more than 1000 Kg CO2/person/year for at least one food type?
df.query("co2_emmission > 1000")
country | food_category | consumption | co2_emmission | |
---|---|---|---|---|
2 | Argentina | Beef | 55.48 | 1712.00 |
13 | Australia | Beef | 33.86 | 1044.85 |
57 | USA | Beef | 36.24 | 1118.29 |
90 | Brazil | Beef | 39.25 | 1211.17 |
123 | Bermuda | Beef | 33.15 | 1022.94 |
9.¶
Which country consumes the least amount of beef per person per year?
(df.query("food_category == 'Beef'")
.sort_values(by="consumption")
.head(1))
country | food_category | consumption | co2_emmission | |
---|---|---|---|---|
1410 | Liberia | Beef | 0.78 | 24.07 |
10.¶
Which country consumes the most amount of soybeans per person per year?
(df.query("food_category == 'Soybeans'")
.sort_values(by="consumption", ascending=False)
.head(1))
country | food_category | consumption | co2_emmission | |
---|---|---|---|---|
1010 | Taiwan. ROC | Soybeans | 16.95 | 7.63 |
11.¶
What is the total emissions of all the meat products (Pork, Poultry, Fish, Lamb & Goat, Beef) in the dataset combined?
meat = ['Poultry', 'Pork', 'Fish', 'Lamb & Goat', 'Beef']
df["co2_emmission"][df['food_category'].isin(meat)].sum()
74441.13
12.¶
What is the total emissions of all other (non-meat) products in the dataset combined?
meat = ['Poultry', 'Pork', 'Fish', 'Lamb & Goat', 'Beef']
df["co2_emmission"][~df['food_category'].isin(meat)].sum()
31927.98