TDM 20200: Project 9 — 2023

Motivation: Being able to analyze and create good visualizations is a skill that is invaluable in many fields. It can be pretty fun too! In this project, we will create plots using the plotly package, as well as do some data manipulation using pandas.

Context: This is the second project focused around creating visualizations in Python.

Scope: plotly, Python, pandas

Learning Objectives
  • Demonstrate the ability to create basic graphs with default settings.

  • Demonstrate the ability to modify axes labels and titles.

  • Demonstrate the ability to customize a plot (color, shape/linetype).

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

  • /anvil/projects/tdm/data/open_food_facts/openfoodfacts.tsv

  • /anvil/projects/tdm/data/techcrunch/test_full.csv

  • /anvil/projects/tdm/data/stackoverflow/unprocessed/2021.csv

  • /anvil/projects/tdm/data/disney/total.parquet

  • /anvil/projects/tdm/data/beer/*.parquet

Questions

Interested in being a TA? Please apply: purdue.ca1.qualtrics.com/jfe/form/SV_08IIpwh19umLvbE

Question 1

Read /anvil/projects/tdm/data/open_food_facts/openfoodfacts.tsv into a pandas dataframe, and create a bar plot using plotly that shows the 10 foods with the largest carbon footprint. The x-axis should be the food name, and the y-axis should be the carbon footprint value. Notice anything odd about the results? How does plotly handle when there are identical names in the x-axis?

Make sure to use fig.show(renderer='jpg') to display your plot, otherwise, the graders will not be able to see your plot, and you will lose credit.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 2

There are 4 other specified datasets for this project. Choose one that you have not yet chosen for a previous question, wrangle the data, and use plotly to create a graphic that is completely new from any other graphic you have created in this project. For example, for this question, you can no longer use the openfoodfacts.tsv dataset, and you can no longer use a bar plot.

The resulting plot must be refined — it should have a proper, cleaned up title, proper, cleaned up x and y-axis labels, and the plot should be easy to read. Add a single markdown cell describing what the plot is showing, and what you learned from it (if anything).

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 3

There are 4 other specified datasets for this project. Choose one that you have not yet chosen for a previous question, wrangle the data, and use plotly to create a graphic that is completely new from any other graphic you have created in this project.

The resulting plot must be refined — it should have a proper, cleaned up title, proper, cleaned up x and y-axis labels, and the plot should be easy to read. Add a single markdown cell describing what the plot is showing, and what you learned from it (if anything).

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 4

There are 4 other specified datasets for this project. Choose one that you have not yet chosen for a previous question, wrangle the data, and use plotly to create a graphic that is completely new from any other graphic you have created in this project. For this question please choose a plot from the "1D Distributions" section on this page.

The resulting plot must be refined — it should have a proper, cleaned up title, proper, cleaned up x and y-axis labels, and the plot should be easy to read. Add a single markdown cell describing what the plot is showing, and what you learned from it (if anything).

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 5

There are 4 other specified datasets for this project. Choose one that you have not yet chosen for a previous question, wrangle the data, and use plotly to create a graphic that is completely new from any other graphic you have created in this project. For this question please choose a plot from any of the sections below (and including) the "3-Dimensional" section on this page.

The resulting plot must be refined — it should have a proper, cleaned up title, proper, cleaned up x and y-axis labels, and the plot should be easy to read. Add a single markdown cell describing what the plot is showing, and what you learned from it (if anything).

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.