mkdir py_project
cd py_project
This post was promised to some friendly folk in the R community who were interested in using python. It is a conceptual introduction and a basic guide to how to set up python in a way that avoids common pain points and traps which I fell into when I first started.
For R users, setting up python can be extremely confusing. The typical setup for R is to download R and Rstudio and away we go. When googling how to set up python as an R user (or someone who wasn’t formally trained in software engineering or computer science), we are returned with multiple tools and tutorials with no obvious place to start.
The first trap I fell into was installing the latest version of python and jupyterlabs then downloading packages freely for all my projects (similar to how we would install.packages()
for any R package we needed). However, I learned very quickly that this would cause problems when different package versions or python versions were required across projects, leading me into dependency hell.
What are virtual environments?
A virtual environment is a project specific python environment with its own python version and libraries. It is isolated from other python environments as well as the system python so that each project can have their own set of dependencies that do not conflict with each other.
We want to create separate virtual environments for each of our projects so that we can avoid dependency hell. There are many tools for creating virtual environments - conda, poetry, venv, etc (as well as renv for R). For now, I will demonstrate how to use venv since this is native to python and avoids installing additional tools that are not needed to get started (other approaches are listed in the appendix).
Setting up virtual environments using venv
Prerequisites
Setting up and using python requires us to get comfortable with the command line. This tutorial is biased towards MacOS. While the principles of what to do are the same, the tools and commands may be different on other operating systems (commands for windows can be found in the venv documentation). So be prepared to do some work in your terminal.
The following walkthrough assumes you have a single version of python3 installed on your machine. Typically when starting a new project, we want to do these steps in order:
- Create a project repo
- Pick a python version using pyenv (optional but recommended)
- Create virtual environment
- Activate virtual environment
- Install packages
Step 2. isn’t strictly required if you just want to get an idea of the basics - so I talk about setting this up in the appendix. However, if you end up needing multiple versions of python3 I highly recommend using pyenv
to manage multiple python versions. If not feel free to ignore it for now.
Virtual environments with venv
First, create and change into the project directory:
Then use python to create your virtual environment. After running this command you will see a folder called proj_env
, which is where all your dependnecies will live.
# Create virtual env, call it proj_env (or .proj_env if you want it hidden)
python3 -m venv proj_env
Activate your virtual env (if this works you should see the name of your virtual env directory on the left hand side of your terminal prompt)
source proj_env/bin/activate
(proj_env) benjaminwee@Benjamins-MacBook-Pro py_project %
Now you can pip install packages for your project. A standard way to do this is to list out the packages you want installed in a requirements.txt
file and to install them all at once (otherwise you can do it individually using pip install <package_name>
. You can also set the specific package version as I have done for matplotlib
which is something I recommend.
echo "numpy" >> requirements.txt
echo "pandas" > requirements.txt
echo "matplotlib==3.7.0" > requirements.txt
pip install -r requirements.txt
And that’s it! As long as your virtual environment is activated, pip install will install packages into proj_env
and will not conflict with other python environments. If you want to exit your virtual environment then run:
deactivate
Appendix A: Managing python versions with pyenv
For python version management I use pyenv (there is also pyenv for windows but I haven’t tried it before). pyenv allows you to download/manage multiple python versions across projects. There is a bit of setup involved but it is worth it if you plan to use different python versions (and even if you don’t, you will probably have a package which relies on a specific python version which causes problems).
I would go straight to the installation step and install via homebrew (a package manager for your mac, happy to answer questions if this is confusing) and set up the shell environment. Once this is done, it is easy to install different python versions and set them for different project repos.
# Install python version 3.10.4 and 3.10
pyenv install 3.10.4
pyenv install 3.10
# Check what python versions are installed on your system
pyenv versions
# Set global python version - this will be the default python version outside of any project repo
pyenv global 3.10
# Set a python version for a new project
cd py_project
pyenv local 3.10.4 # Set python version to 3.10.4 for this "local" repo
pyenv versions # check the correct python is set for the project
python # last check to make sure the correct python is being used for your project
Then we can follow the same commands to create a virtual environment using this specific python version
python -m venv proj_env
source proj_env/bin/activate
echo "numpy" >> requirements.txt
echo "pandas" > requirements.txt
echo "matplotlib==3.7.0" > requirements.txt
pip install -r requirements.txt
Appendix B: Different ways of setting up a python project
There are 3 setups I typically do for python. I walked through the first one above, but I’m happy to go through the others on request.
Approaches to setting up python:
Basic environment setup - pyenv + venv
poetry - requires extra tool, but it does dependency management between packages really well
Docker - it’s great, but may take some time to setup if you’re new.