Pandas

Tomas Beuzen, September 2020

These exercises complement Chapter 7.

Exercises

1.

In this set of practice exercises we’ll be investigating the carbon footprint of different foods. We’ll be leveraging a dataset compiled by Kasia Kulma and contributed to R’s Tidy Tuesday project.

Start by importing pandas with the alias pd.

# Your answer here.

2.

The dataset we’ll be working with has the following columns:

column

description

country

Country Name

food_category

Food Category

consumption

Consumption (kg/person/year)

co2_emmission

Co2 Emission (Kg CO2/person/year)

Import the dataset as a dataframe named df from this url: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv

# Your answer here.

3.

How many rows and columns are there in the dataframe?

# Your answer here.

4.

What is the type of data in each column of df?

# Your answer here.

5.

What is the mean co2_emission of the whole dataset?

# Your answer here.

6.

How many different kinds of foods are there in the dataset? How many countries are in the dataset?

# Your answer here.

7.

What is the maximum co2_emmission in the dataset and which food type and country does it belong to?

# Your answer here.

8.

How many countries produce more than 1000 Kg CO2/person/year for at least one food type?

# Your answer here.

9.

Which country consumes the least amount of beef per person per year?

# Your answer here.

10.

Which country consumes the most amount of soybeans per person per year?

# Your answer here.

11.

What is the total emissions of all the meat products (Pork, Poultry, Fish, Lamb & Goat, Beef) in the dataset combined?

# Your answer here.

12.

What is the total emissions of all other (non-meat) products in the dataset combined?

# Your answer here.



Solutions

1.

In this set of practice exercises we’ll be investigating the carbon footprint of different foods. We’ll be leveraging a dataset compiled by Kasia Kulma and contributed to R’s Tidy Tuesday project.

Start by importing pandas with the alias pd.

import pandas as pd

2.

The dataset we’ll be working with has the following columns:

column

description

country

Country Name

food_category

Food Category

consumption

Consumption (kg/person/year)

co2_emmission

Co2 Emission (Kg CO2/person/year)

Import the dataset as a dataframe named df from this url: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv

url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv"
df = pd.read_csv(url)
df
country food_category consumption co2_emmission
0 Argentina Pork 10.51 37.20
1 Argentina Poultry 38.66 41.53
2 Argentina Beef 55.48 1712.00
3 Argentina Lamb & Goat 1.56 54.63
4 Argentina Fish 4.36 6.96
... ... ... ... ...
1425 Bangladesh Milk - inc. cheese 21.91 31.21
1426 Bangladesh Wheat and Wheat Products 17.47 3.33
1427 Bangladesh Rice 171.73 219.76
1428 Bangladesh Soybeans 0.61 0.27
1429 Bangladesh Nuts inc. Peanut Butter 0.72 1.27

1430 rows × 4 columns

3.

How many rows and columns are there in the dataframe?

df.shape
(1430, 4)

4.

What is the type of data in each column of df?

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1430 entries, 0 to 1429
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   country        1430 non-null   object 
 1   food_category  1430 non-null   object 
 2   consumption    1430 non-null   float64
 3   co2_emmission  1430 non-null   float64
dtypes: float64(2), object(2)
memory usage: 44.8+ KB

5.

What is the mean co2_emission of the whole dataset?

df["co2_emmission"].mean()
74.383993006993

6.

How many different kinds of foods are there in the dataset? How many countries are in the dataset?

print(f"There are {df['food_category'].nunique()} foods.")
print(f"There are {df['country'].nunique()} countries.")
There are 11 foods.
There are 130 countries.

7.

What is the maximum co2_emmission in the dataset and which food type and country does it belong to?

df.iloc[df['co2_emmission'].idxmax()]
country          Argentina
food_category         Beef
consumption          55.48
co2_emmission         1712
Name: 2, dtype: object

8.

How many countries produce more than 1000 Kg CO2/person/year for at least one food type?

df.query("co2_emmission > 1000")
country food_category consumption co2_emmission
2 Argentina Beef 55.48 1712.00
13 Australia Beef 33.86 1044.85
57 USA Beef 36.24 1118.29
90 Brazil Beef 39.25 1211.17
123 Bermuda Beef 33.15 1022.94

9.

Which country consumes the least amount of beef per person per year?

(df.query("food_category == 'Beef'")
   .sort_values(by="consumption")
   .head(1))
country food_category consumption co2_emmission
1410 Liberia Beef 0.78 24.07

10.

Which country consumes the most amount of soybeans per person per year?

(df.query("food_category == 'Soybeans'")
   .sort_values(by="consumption", ascending=False)
   .head(1))
country food_category consumption co2_emmission
1010 Taiwan. ROC Soybeans 16.95 7.63

11.

What is the total emissions of all the meat products (Pork, Poultry, Fish, Lamb & Goat, Beef) in the dataset combined?

meat = ['Poultry', 'Pork', 'Fish', 'Lamb & Goat', 'Beef']
df["co2_emmission"][df['food_category'].isin(meat)].sum()
74441.13

12.

What is the total emissions of all other (non-meat) products in the dataset combined?

meat = ['Poultry', 'Pork', 'Fish', 'Lamb & Goat', 'Beef']
df["co2_emmission"][~df['food_category'].isin(meat)].sum()
31927.98