• ANALITICAS
  • Posts
  • Import data from a .csv file with Python and pandas

Import data from a .csv file with Python and pandas

Table of Contents

Fetching the data is the usual starting point of data analysis. In this lesson, we’ll how to import data from a .csv file using Python and its library pandas.

First of all, let’s import pandas:

import pandas as pd

The basic import

To read the data saved on a .csv file and use them to populate a pandas dataframe, we can leverage read_csv() command:

df = pd.read_csv('file.csv')

Specify the file’s path

The example above would work if the .csv file and the Python script are in the same location (either in a cloud-based repository or on our machine).

If that is not the case, we need to specify the file’s path:

df = pd.read_csv(r'C:\User\path\file.csv')

Specify the separator

As you may already know, .csv files use a character the separate the data. This character is commonly referred to as “separator”.

The default separator of the read_csv() command is a comma. However, if your .csv file uses a different character as separator, you must specify it.

To do this, you can use either the sep or the delimiter parameters.

Here’s a few examples using sep:

df = pd.read_csv(r'C:\User\path\file.csv', sep=',')

df = pd.read_csv(r'C:\User\path\file.csv', sep=';')

df = pd.read_csv(r'C:\User\path\file.csv', sep=' ')

And here’s what our code would look like, if we used delimiter:

df = pd.read_csv(r'C:\User\path\file.csv', delimiter=',')

df = pd.read_csv(r'C:\User\path\file.csv', delimiter=';')

df = pd.read_csv(r'C:\User\path\file.csv', delimiter=' ')

Handling white spaces as separator

If our separator is any type of white space (a tab, a double space, a single space, etc.), we can leverage the delim_whitespace parameter, in alternative to the ones seen above.

Here’s how this would work:

df = pd.read_csv(r'C:\User\path\file.csv', delim_whitespace=True)

Important!

The default value of delim_whitespace is “False”.

Specifying the header

If the header of our data isn’t the first row of the .csv file, you may want to specify that. To do so, you need to leverage the header parameter:

df = pd.read_csv(r'C:\User\path\file.csv', header=1)

In this case, we’re signalling that the second row in the file will be the dataframe’s header.

Important!

Counting in Python starts at 0. Therefore the value of 1 corresponds to the second row.

Specifying the dataframe’s index

When creating a dataframe, pandas will add an index column. If you want make one of the columns in the .csv file the index, you can do that already within the read_csv() command.

You can do that by specifying the position of the column:

df = pd.read_csv(r'C:\User\path\file.csv', index_col=0)

Or by specifying the column name:

df = pd.read_csv(r'C:\User\path\file.csv', index_col=['id'])

Only importing some columns

It can happen that we don’t need all the data on our .csv file. Rather than removing the unwanted columns later on, we can decide not to import them from the very beginning.

This is done by leveraging the usecols parameter.

What if you need to import data from an Excel file? We’ve got you covered. Here’s a quick lesson about that.

Don’t miss out the next issue of the ANALITICAS newsletter! Subscribe now: