- ANALITICAS
- Posts
- Import data from a .csv file with Python and pandas
Import data from a .csv file with Python and pandas
Table of Contents
Fetching the data is the usual starting point of data analysis. In this lesson, we’ll how to import data from a .csv file using Python and its library pandas.
First of all, let’s import pandas:
import pandas as pd
The basic import
To read the data saved on a .csv file and use them to populate a pandas dataframe, we can leverage read_csv()
command:
df = pd.read_csv('file.csv')
Specify the file’s path
The example above would work if the .csv file and the Python script are in the same location (either in a cloud-based repository or on our machine).
If that is not the case, we need to specify the file’s path:
df = pd.read_csv(r'C:\User\path\file.csv')
Specify the separator
As you may already know, .csv files use a character the separate the data. This character is commonly referred to as “separator”.
The default separator of the read_csv()
command is a comma. However, if your .csv file uses a different character as separator, you must specify it.
To do this, you can use either the sep
or the delimiter
parameters.
Here’s a few examples using sep
:
df = pd.read_csv(r'C:\User\path\file.csv', sep=',')
df = pd.read_csv(r'C:\User\path\file.csv', sep=';')
df = pd.read_csv(r'C:\User\path\file.csv', sep=' ')
And here’s what our code would look like, if we used delimiter
:
df = pd.read_csv(r'C:\User\path\file.csv', delimiter=',')
df = pd.read_csv(r'C:\User\path\file.csv', delimiter=';')
df = pd.read_csv(r'C:\User\path\file.csv', delimiter=' ')
Handling white spaces as separator
If our separator is any type of white space (a tab, a double space, a single space, etc.), we can leverage the delim_whitespace
parameter, in alternative to the ones seen above.
Here’s how this would work:
df = pd.read_csv(r'C:\User\path\file.csv', delim_whitespace=True)
Important!
The default value of delim_whitespace
is “False”.
Specifying the header
If the header of our data isn’t the first row of the .csv file, you may want to specify that. To do so, you need to leverage the header
parameter:
df = pd.read_csv(r'C:\User\path\file.csv', header=1)
In this case, we’re signalling that the second row in the file will be the dataframe’s header.
Important!
Counting in Python starts at 0. Therefore the value of 1 corresponds to the second row.
Specifying the dataframe’s index
When creating a dataframe, pandas will add an index column. If you want make one of the columns in the .csv file the index, you can do that already within the read_csv()
command.
You can do that by specifying the position of the column:
df = pd.read_csv(r'C:\User\path\file.csv', index_col=0)
Or by specifying the column name:
df = pd.read_csv(r'C:\User\path\file.csv', index_col=['id'])
Only importing some columns
It can happen that we don’t need all the data on our .csv file. Rather than removing the unwanted columns later on, we can decide not to import them from the very beginning.
This is done by leveraging the usecols
parameter.
What if you need to import data from an Excel file? We’ve got you covered. Here’s a quick lesson about that.
Don’t miss out the next issue of the ANALITICAS newsletter! Subscribe now: