Pandas (click here for the website) is an open source data analysis and manipulation tool. You can use it for transforming text data, for (re)structuring existing files, for joining data files and it is, therefore, a good tool to add to your Data Science toolkit.
All operations will be executed in memory and you can thus useful for quickly prototyping data pipelines. I would recommend that you follow along using a Jupyter notebook which allows you to execute all your code in memory. You can install and open Jupyter notebook by executing two commands in your terminal:
pip install jupyterlab jupyter lab
Every time you want to continue the tutorial, remember to execute the “jupyter lab” command.
Before we continue, please install Pandas in your Python environment by using the following command:
pip install pandas==1.3.0
In the next section, we will see what Pandas is not.