Project: A Visual History of Nobel Prize Winners

The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the nobel.csv file in the datasets folder.

In this project, you’ll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you’re interested in!

# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np

# Start coding here!
nobel = pd.read_csv("datasets/nobel.csv")
nobel.columns
Index(['year', 'category', 'prize', 'motivation', 'prize_share', 'laureate_id',
       'laureate_type', 'full_name', 'birth_date', 'birth_city',
       'birth_country', 'sex', 'organization_name', 'organization_city',
       'organization_country', 'death_date', 'death_city', 'death_country'],
      dtype='object')
nobel.head(3)
year category prize motivation prize_share laureate_id laureate_type full_name birth_date birth_city birth_country sex organization_name organization_city organization_country death_date death_city death_country
0 1901 Chemistry The Nobel Prize in Chemistry 1901 "in recognition of the extraordinary services ... 1/1 160 Individual Jacobus Henricus van 't Hoff 1852-08-30 Rotterdam Netherlands Male Berlin University Berlin Germany 1911-03-01 Berlin Germany
1 1901 Literature The Nobel Prize in Literature 1901 "in special recognition of his poetic composit... 1/1 569 Individual Sully Prudhomme 1839-03-16 Paris France Male NaN NaN NaN 1907-09-07 Châtenay France
2 1901 Medicine The Nobel Prize in Physiology or Medicine 1901 "for his work on serum therapy, especially its... 1/1 293 Individual Emil Adolf von Behring 1854-03-15 Hansdorf (Lawice) Prussia (Poland) Male Marburg University Marburg Germany 1917-03-31 Marburg Germany

Task 1

What is the most commonly awarded gender and birth country? Storing the string answers as top_gender and top_country.

count_by_gender = nobel.value_counts("sex")
top_gender=count_by_gender.index[0]
top_gender
'Male'
count_by_country=nobel.value_counts("birth_country")
top_country = count_by_country.index[0]
top_country
'United States of America'

Task 2

What decade had the highest proportion of US-born winners? Store this as an integer called max_decade_usa.

nobel["USA_born_winers"] = nobel['birth_country']=='United States of America'
nobel["decade"] = 10*(nobel["year"]/10).astype("int")
10*(nobel["year"]/10).astype("int").unique()
array([1900, 1910, 1920, 1930, 1940, 1950, 1960, 1970, 1980, 1990, 2000,
       2010, 2020])
nobel.head(5)
year category prize motivation prize_share laureate_id laureate_type full_name birth_date birth_city birth_country sex organization_name organization_city organization_country death_date death_city death_country USA_born_winers decade
0 1901 Chemistry The Nobel Prize in Chemistry 1901 "in recognition of the extraordinary services ... 1/1 160 Individual Jacobus Henricus van 't Hoff 1852-08-30 Rotterdam Netherlands Male Berlin University Berlin Germany 1911-03-01 Berlin Germany False 1900
1 1901 Literature The Nobel Prize in Literature 1901 "in special recognition of his poetic composit... 1/1 569 Individual Sully Prudhomme 1839-03-16 Paris France Male NaN NaN NaN 1907-09-07 Châtenay France False 1900
2 1901 Medicine The Nobel Prize in Physiology or Medicine 1901 "for his work on serum therapy, especially its... 1/1 293 Individual Emil Adolf von Behring 1854-03-15 Hansdorf (Lawice) Prussia (Poland) Male Marburg University Marburg Germany 1917-03-31 Marburg Germany False 1900
3 1901 Peace The Nobel Peace Prize 1901 NaN 1/2 462 Individual Jean Henry Dunant 1828-05-08 Geneva Switzerland Male NaN NaN NaN 1910-10-30 Heiden Switzerland False 1900
4 1901 Peace The Nobel Peace Prize 1901 NaN 1/2 463 Individual Frédéric Passy 1822-05-20 Paris France Male NaN NaN NaN 1912-06-12 Paris France False 1900
decade_sum = nobel.groupby("decade")["USA_born_winers"].agg("count")
decade_sum
decade
1900     57
1910     40
1920     54
1930     56
1940     43
1950     72
1960     79
1970    104
1980     97
1990    104
2000    123
2010    121
2020     50
Name: USA_born_winers, dtype: int64
USA_only = nobel[nobel["USA_born_winers"]]
USA_only.head(10)
year category prize motivation prize_share laureate_id laureate_type full_name birth_date birth_city birth_country sex organization_name organization_city organization_country death_date death_city death_country USA_born_winers decade
35 1906 Peace The Nobel Peace Prize 1906 NaN 1/1 470 Individual Theodore Roosevelt 1858-10-27 New York, NY United States of America Male NaN NaN NaN 1919-01-06 Oyster Bay, NY United States of America True 1900
72 1912 Peace The Nobel Peace Prize 1912 NaN 1/1 480 Individual Elihu Root 1845-02-15 Clinton, NY United States of America Male NaN NaN NaN 1937-02-07 New York, NY United States of America True 1910
79 1914 Chemistry The Nobel Prize in Chemistry 1914 "in recognition of his accurate determinations... 1/1 175 Individual Theodore William Richards 1868-01-31 Germantown, PA United States of America Male Harvard University Cambridge, MA United States of America 1928-04-02 Cambridge, MA United States of America True 1910
95 1919 Peace The Nobel Peace Prize 1919 NaN 1/1 483 Individual Thomas Woodrow Wilson 1856-12-28 Staunton, VA United States of America Male NaN NaN NaN 1924-02-03 Washington, DC United States of America True 1910
117 1923 Physics The Nobel Prize in Physics 1923 "for his work on the elementary charge of elec... 1/1 28 Individual Robert Andrews Millikan 1868-03-22 Morrison, IL United States of America Male California Institute of Technology (Caltech) Pasadena, CA United States of America 1953-12-19 San Marino, CA United States of America True 1920
124 1925 Peace The Nobel Peace Prize 1925 NaN 1/2 489 Individual Charles Gates Dawes 1865-08-27 Marietta, OH United States of America Male NaN NaN NaN 1951-04-23 Evanston, IL United States of America True 1920
138 1927 Physics The Nobel Prize in Physics 1927 "for his discovery of the effect named after him" 1/2 33 Individual Arthur Holly Compton 1892-09-10 Wooster, OH United States of America Male University of Chicago Chicago, IL United States of America 1962-03-15 Berkeley, CA United States of America True 1920
149 1929 Peace The Nobel Peace Prize 1929 NaN 1/1 494 Individual Frank Billings Kellogg 1856-12-22 Potsdam, NY United States of America Male NaN NaN NaN 1937-12-21 St. Paul, MN United States of America True 1920
152 1930 Literature The Nobel Prize in Literature 1930 "for his vigorous and graphic art of descripti... 1/1 603 Individual Sinclair Lewis 1885-02-07 Sauk Centre, MN United States of America Male NaN NaN NaN 1951-01-10 Rome Italy True 1930
160 1931 Peace The Nobel Peace Prize 1931 NaN 1/2 496 Individual Jane Addams 1860-09-06 Cedarville, IL United States of America Female NaN NaN NaN 1935-05-21 Chicago, IL United States of America True 1930
decade_sum_usa = USA_only.groupby("decade")["USA_born_winers"].agg("count")
decade_sum_usa
decade
1900     1
1910     3
1920     4
1930    14
1940    13
1950    21
1960    21
1970    33
1980    31
1990    42
2000    52
2010    38
2020    18
Name: USA_born_winers, dtype: int64
USA_decade_prop = pd.DataFrame(decade_sum_usa/decade_sum)
max_decade_usa=USA_decade_prop[USA_decade_prop["USA_born_winers"]==USA_decade_prop["USA_born_winers"].max()].index[0]
max_decade_usa
2000

Task 3

What decade and category pair had the highest proportion of female laureates? Store this as a dictionary called max_female_dict where the decade is the key and the category is the value.

decade_N_cat = nobel.groupby(["decade","category"])["prize"].agg("count")
decade_N_cat
decade  category  
1900    Chemistry      9
        Literature    10
        Medicine      11
        Peace         14
        Physics       13
                      ..
2020    Economics      9
        Literature     4
        Medicine       8
        Peace          7
        Physics       12
Name: prize, Length: 72, dtype: int64
nobel["male"] = nobel["sex"]=="Male"
nobel_male_only=nobel[nobel["male"]]
nobel_male_only.shape

decade_N_cat_male = nobel_male_only.groupby(["decade","category"])["prize"].agg("count")
decade_N_cat_male
decade  category  
1900    Chemistry      9
        Literature     9
        Medicine      11
        Peace         12
        Physics       12
                      ..
2020    Economics      8
        Literature     2
        Medicine       7
        Peace          2
        Physics       10
Name: prize, Length: 72, dtype: int64
male_prop = decade_N_cat_male/decade_N_cat
male_prop
decade  category  
1900    Chemistry     1.000000
        Literature    0.900000
        Medicine      1.000000
        Peace         0.857143
        Physics       0.923077
                        ...   
2020    Economics     0.888889
        Literature    0.500000
        Medicine      0.875000
        Peace         0.285714
        Physics       0.833333
Name: prize, Length: 72, dtype: float64
female_prop = 1-male_prop
female_prop
np.max(female_prop)
max_female_prop = female_prop.idxmax()
max_female_dict = dict([max_female_prop])
max_female_dict
{2020: 'Peace'}
# Calculating the proportion of female laureates per decade
nobel['female_winner'] = nobel.sex=='Female'
prop_female_winners = nobel.groupby(['decade','category'],as_index=False)['female_winner'].mean()
prop_female_winners.sort_values("female_winner",ascending=False,inplace=True)
prop_female_winners
decade category female_winner
68 2020 Literature 0.500000
64 2010 Peace 0.357143
50 1990 Literature 0.300000
56 2000 Literature 0.300000
66 2020 Chemistry 0.300000
... ... ... ...
34 1960 Peace 0.000000
37 1970 Economics 0.000000
38 1970 Literature 0.000000
41 1970 Physics 0.000000
36 1970 Chemistry 0.000000

72 rows × 3 columns

#max_female_dict = dict([prop_female_winners["decade"].iloc[0],])


decade=prop_female_winners["decade"].iloc[0]
category=prop_female_winners["category"].iloc[0]
print(category)
max_female_dict = {decade:category}
max_female_dict
Literature
{2020: 'Literature'}

Task 4

Who was the first woman to receive a Nobel Prize, and in what category? Save your string answers as first_woman_name and first_woman_category.

nobel_female_only = nobel[nobel.female_winner==True]
nobel_female_only.sort_values("year",ascending=True,inplace=True)
nobel_female_only
/var/folders/53/yp3kynfd7rn5y13c2wwfm33rmgtrfb/T/ipykernel_38970/3855593822.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  nobel_female_only.sort_values("year",ascending=True,inplace=True)
year category prize motivation prize_share laureate_id laureate_type full_name birth_date birth_city ... organization_name organization_city organization_country death_date death_city death_country USA_born_winers decade male female_winner
19 1903 Physics The Nobel Prize in Physics 1903 "in recognition of the extraordinary services ... 1/4 6 Individual Marie Curie, née Sklodowska 1867-11-07 Warsaw ... NaN NaN NaN 1934-07-04 Sallanches France False 1900 False True
29 1905 Peace The Nobel Peace Prize 1905 NaN 1/1 468 Individual Baroness Bertha Sophie Felicita von Suttner, n... 1843-06-09 Prague ... NaN NaN NaN 1914-06-21 Vienna Austria False 1900 False True
51 1909 Literature The Nobel Prize in Literature 1909 "in appreciation of the lofty idealism, vivid ... 1/1 579 Individual Selma Ottilia Lovisa Lagerlöf 1858-11-20 Mårbacka ... NaN NaN NaN 1940-03-16 Mårbacka Sweden False 1900 False True
62 1911 Chemistry The Nobel Prize in Chemistry 1911 "in recognition of her services to the advance... 1/1 6 Individual Marie Curie, née Sklodowska 1867-11-07 Warsaw ... Sorbonne University Paris France 1934-07-04 Sallanches France False 1910 False True
128 1926 Literature The Nobel Prize in Literature 1926 "for her idealistically inspired writings whic... 1/1 597 Individual Grazia Deledda 1871-09-27 Nuoro, Sardinia ... NaN NaN NaN 1936-08-15 Rome Italy False 1920 False True
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
982 2022 Literature The Nobel Prize in Literature 2022 "for the courage and clinical acuity with whic... 1/1 1017 Individual Annie Ernaux 1940-09-01 Lillebonne ... NaN NaN NaN NaN NaN NaN False 2020 False True
993 2023 Physics The Nobel Prize in Physics 2023 "for experimental methods that generate attose... 1/3 1028 Individual Anne L’Huillier 1958-08-16 Paris ... Lund University Lund Sweden NaN NaN NaN False 2020 False True
998 2023 Peace The Nobel Peace Prize 2023 "for her fight against the oppression of women... 1/1 1033 Individual Narges Mohammadi 1972-04-21 Zanjan ... NaN NaN NaN NaN NaN NaN False 2020 False True
989 2023 Medicine The Nobel Prize in Physiology or Medicine 2023 "for their discoveries concerning nucleoside b... 1/2 1024 Individual Katalin Karikó 1955-01-17 Szolnok ... Szeged University Szeged Hungary NaN NaN NaN False 2020 False True
999 2023 Economics The Sveriges Riksbank Prize in Economic Scienc... "for having advanced our understanding of wome... 1/1 1034 Individual Claudia Goldin 1946-00-00 New York, NY ... Harvard University Cambridge, MA United States of America NaN NaN NaN True 2020 False True

65 rows × 22 columns

first_woman_name=nobel_female_only["full_name"].iloc[0]
first_woman_name
'Marie Curie, née Sklodowska'
first_woman_category = nobel_female_only["category"].iloc[0]
first_woman_category
'Physics'

Task 5

Which individuals or organizations have won multiple Nobel Prizes throughout the years? Store the full names in a list named repeat_list.

nobel_by_name = pd.DataFrame(nobel.groupby("full_name")["year"].agg("count"))
nobel_by_name.sort_values("year",ascending=False,inplace=True)
nobel_by_name=nobel_by_name[nobel_by_name["year"]>=2]
nobel_by_name
year
full_name
Comité international de la Croix Rouge (International Committee of the Red Cross) 3
Office of the United Nations High Commissioner for Refugees (UNHCR) 2
Frederick Sanger 2
Linus Carl Pauling 2
John Bardeen 2
Marie Curie, née Sklodowska 2
repeat_list = list(nobel_by_name.index)
repeat_list
['Comité international de la Croix Rouge (International Committee of the Red Cross)',
 'Office of the United Nations High Commissioner for Refugees (UNHCR)',
 'Frederick Sanger',
 'Linus Carl Pauling',
 'John Bardeen',
 'Marie Curie, née Sklodowska']

Solution provided by DataCamp

# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np

# Read in the Nobel Prize data
nobel = pd.read_csv('datasets/nobel.csv')

# Store and display the most commonly awarded gender and birth country in requested variables
top_gender = nobel['sex'].value_counts().index[0]
top_country = nobel['birth_country'].value_counts().index[0]

print("\n The gender with the most Nobel laureates is :", top_gender)
print(" The most common birth country of Nobel laureates is :", top_country)

# Calculate the proportion of USA born winners per decade
nobel['usa_born_winner'] = nobel['birth_country'] == 'United States of America'
nobel['decade'] = (np.floor(nobel['year'] / 10) * 10).astype(int)
prop_usa_winners = nobel.groupby('decade', as_index=False)['usa_born_winner'].mean()

# Identify the decade with the highest proportion of US-born winners
max_decade_usa = prop_usa_winners[prop_usa_winners['usa_born_winner'] == prop_usa_winners['usa_born_winner'].max()]['decade'].values[0]

# Optional: Plotting USA born winners
ax1 = sns.relplot(x='decade', y='usa_born_winner', data=prop_usa_winners, kind="line")

# Calculating the proportion of female laureates per decade
nobel['female_winner'] = nobel['sex'] == 'Female'
prop_female_winners = nobel.groupby(['decade', 'category'], as_index=False)['female_winner'].mean()

# Find the decade and category with the highest proportion of female laureates
max_female_decade_category = prop_female_winners[prop_female_winners['female_winner'] == prop_female_winners['female_winner'].max()][['decade', 'category']]

# Create a dictionary with the decade and category pair
max_female_dict = {max_female_decade_category['decade'].values[0]: max_female_decade_category['category'].values[0]}

# Optional: Plotting female winners with % winners on the y-axis
ax2 = sns.relplot(x='decade', y='female_winner', hue='category', data=prop_female_winners, kind="line")

# Finding the first woman to win a Nobel Prize
nobel_women = nobel[nobel['female_winner']]
min_row = nobel_women[nobel_women['year'] == nobel_women['year'].min()]
first_woman_name = min_row['full_name'].values[0]
first_woman_category = min_row['category'].values[0]
print(f"\n The first woman to win a Nobel Prize was {first_woman_name}, in the category of {first_woman_category}.")

# Selecting the laureates that have received 2 or more prizes
counts = nobel['full_name'].value_counts()
repeats = counts[counts >= 2].index
repeat_list = list(repeats)

print("\n The repeat winners are :", repeat_list)

 The gender with the most Nobel laureates is : Male
 The most common birth country of Nobel laureates is : United States of America

 The first woman to win a Nobel Prize was Marie Curie, née Sklodowska, in the category of Physics.

 The repeat winners are : ['Comité international de la Croix Rouge (International Committee of the Red Cross)', 'Linus Carl Pauling', 'John Bardeen', 'Frederick Sanger', 'Marie Curie, née Sklodowska', 'Office of the United Nations High Commissioner for Refugees (UNHCR)']