Exercises for Plotly Express#

These exercises revolve around using Plotly Express. We have not included this package in the list of packages needed in IN1910, so if you want to do these exercises you might need to install these packages yourself. You can do this through pip, try the following command in Jupyter

  • !pip install plotly

If you are working in Jupyter Notebook (locally) you should also run the following command to make sure your Jupyter is up to date and can show the graphs correctly.

  • !pip install "notebook>=5.3" "ipywidgets>=7.5"

If these commands do not work, and you have installed python through Anaconda, you can instead try to install through conda. You can read more about that here:

If you cannot get this to work, please ask for help on Mattermost.

Irises#

These exercises revolves around a famous dataset compiled by biologist Ronald Fischer, who measured the size of 150 different Iris flowers. The dataset has become a popular example data set for data science and machine learning. In his dataset, Fisher measured 3 different types of Iris flowers, measuring 50 samples of each type.

The data set has its own Wikipedia-page you can read if you want more background information. Though it won’t be necessary to solve the exercises.

Figure 1: An Iris flower of Versicolor variant. Source: Wikimedia Commons (CC BY-SA 3.0).

Exercise a) Reading in the data#

Import plotly.express as px and read in the Iris dataset using the command px.data.iris(). Then use the methods info, describe and head methods to explore the dataset briefly. What information do we have available?

Exercise b) Finding average data#

Use the groupby and mean functions to find the average data for each species of Iris. How many species are there? Which one seems to be the largest?

Exercise c) Plotting data#

Using Plotly Express, make a scatter plot showing the “petal length” vs the “petal width”. Are the three species easy to differentiate in the plot, or do they overlap? In other words: Can you decide which species a given flower is based on the petal measurements alone?

If you want to, you can also add a regression line trendline='ols' if you want to, but note that this requires you have the statsmodels package installed.

Exercise d) Adding colors#

If you have not already done so, add colors to your scatter plot to visually differentiate your three species. Explain what happens in your plot and what these results mean.

If you added a trendline, what happens to the trendline when you group data by species?

Exercise e) Sepal measurements#

Repeat exercises c and d, but this time plot the sepal length and width instead of the petal length and width. Explain the main differences in the petal plot and the sepal plot.

Exercise f) Comparing petal and sepal size#

We now want to compare the petal size with the sepal size, but it gets a bit tricky plotting two dimensions against two dimensions. Therefore we first add two new columns to our dataset.

Add a petal_size and a sepal_size column. These should be computed by multiplying the length and width of the petal and sepal respectively (the “size” is then effectively a sort of area measurement).

Exercise g) Plotting sepal vs petal#

Make a scatter plot of the sepal_size vs the petal_size. Explain how the three species compare in this figure.

Exercise h) Boxplot and violinplot#

We now want to make a box plot. If you have never heard of such a plot, it is a common way of displaying several statistical distributions next to each other.

Run the code given below, look at the result and explain what it is telling us. (You might need to change the name of the dataframe in the code). Try changing out what data is shown in the box plots.

fig = px.box(iris, x='species', y='petal_length', color='species')
fig.show()

Exercise i) Violin plot#

A slight variation of a box plot is the violin plot. Change the px.box command with px.violin, keeping everything else similar. Explain the figure you get.

Exercise j) Marginal plots#

Now make a new scatter plot of petal_size vs sepal_size (you can copy your code from exercise g). But now add the keyword arguments

marginal_x='box'
marginal_y='box'

What does this do to the figure?

Exercise k) Histogram#

Finally, we want to make a histogram. Make a figure using px.histogram. Use the official documentation or the help command to check what parameters you can add and what they do. Play around with it. You should especially check out the arguments nbins and barmode, what do these do?