For this project you must create a data set by simulating a real-world phenomenon of
your choosing. You may pick any phenomenon you wish – you might pick one that is
of interest to you in your personal or professional life. Then, rather than collect data
related to the phenomenon, you should model and synthesise such data using Python.
We suggest you use the numpy.random package for this purpose.
Specifically, in this project you should:
Choose a real-world phenomenon that can be measured and for which you could collect at least one-hundred data points across at least four different variables.
Investigate the types of variables involved, their likely distributions, and their relationships with each other.
Synthesise/simulate a data set as closely matching their properties as possible.
Detail your research and implement the simulation in a Jupyter notebook
The data set itself can simply be displayed in an output cell within the notebook.
The minimum standard for this project is a git repository containing a README, a gitignore file and a Jupyter notebook.
The README need only contain an explanation of what is contained in the repository and how to run the Jupyter notebook. Your notebook should contain the main body of work and should list all references used in completing the project.
A good submission will be clearly organised and
contain concise explanations of the particularities of the data set.
The analysis contained within the notebook will be well conceived,
interesting, and
well researched.
Note that part of this project is about the use of Jupyter notebooks and so you should
make use of all the functionality available in the software including
images,
links,
code and
plots.
You may use any Python libraries that you wish, whether they have been discussed in class or not.