Read write parse CSV

How to Read, Write and Parse CSV in Python

A popular format that exchanges details through text files is the CSV format. It is easy to use CSV because you don’t have to build your own CSV parser. Python contains several suitable libraries you can use. One of them is a Python csv module that will work on many occasions. Also, there is a pandas library that has CSV parsing capabilities. This library will be useful if you need numerical analysis or your work requires lots of data.

What is CSV

Comma Separated Values file is a type of text file that allows data to be saved in a tabular format. In a CSV file in Python information is separated by commas, which is given away by his name. CSV file is used to move information between the programs that aren’t able to exchange data. Two programs can exchange data only if they both can open a CSV file. A sample CSV file for Python:

name of col 1, name of col 2, name of col 3 name
data of row 1, data of row 1, data of row 1
data of row 2, data of row 2, data of row 2
...

A typical CSV file in python example:

Clothes, Size, Price
T-shirt, Medium, $20
Pants, Medium, $25

The divider symbol is named a “delimiter”. In a Python CSV file, there can be anything as the separator. Here are other delimiters that are less popular: colon (:), semi-colon (;) and the tab (\t).

Parsing CSV Files by Using Python csv Library

To parse CSV Python means read the data from CSV. The csv library allows you to read from and write to CSV Python. This csv package Python you can describe the CSV formats understood by other applications or even define your own CSV format. The csv module contains reader and Python writer objects that help you manipulate CSV Python. Also, you can use Python DictReader and DictWriter classes to read and write data in dictionary form. Let’s see Python working with csv library.

How to Read CSV Files by Using the Python csv Package

As already mentioned, the CSV reader Python object is used for reading from a CSV file. Python open() CSV function opens the CSV program as a text. Here is an example:

name,birthday day,birthday month,birthday year
Lochana Cleitus,16,April,1995
Aberash Juliya,8,March,1999
Yunuen Walenty,3,January,1996

Here is a code to read this file:

import csv
with open('data.txt') as csv_file:
read_csv = csv.reader(csv_file, delimiter=',')
line_counter = 0
for text in read_csv:
if line_counter == 0:
print(f'Columns: {", ".join(text)}')
line_counter += 1
else:
print(f'\t{text[0]} was born on {text[2]} {text[1]}, {text[3]}.')
line_counter += 1
print(f'There are {line_counter} lines.')
#Output:
#Columns: name, birthday day, birthday month, birthday year
# Lochana Cleitus was born on April 16, 1995.
# Aberash Juliya was born on March 8, 1999.
# Yunuen Walenty was born on January 3, 1996.
#There are 4 lines.

Python Read CSV Files Into a Dictionary

Each string returned by the reader is a list of string elements containing the data that is found by removing the delimiters. The first row returned contains the column names that are treated in a special way.

Instead of creating a list of string elements, you can read the CSV data into a dictionary. Our input file is the same as last time:

name,birthday day,birthday month,birthday year
Lochana Cleitus,16,April,1995
Aberash Juliya,8,March,1999
Yunuen Walenty,3,January,1996

Here is Python csv DictReader example:

import csv
with open('data.txt', mode='r') as csv_file:
read_csv = csv.DictReader(csv_file)
line_counter = 0
for text in read_csv:
if line_counter == 0:
print(f'Columns: {", ".join(text)}')
line_counter += 1
print(f'\t{text["name"]} was born on {text["birthday month"]} {text["birthday day"]}, {text["birthday year"]}.')
line_counter += 1
print(f'There are {line_counter} lines.')
#Output:
#Columns: name, birthday day, birthday month, birthday year
# Lochana Cleitus was born on April 16, 1995.
# Aberash Juliya was born on March 8, 1999.
# Yunuen Walenty was born on January 3, 1996.
#There are 4 lines.

In the code above we used python csv open function. Keys for the dictionary are contained in the first line of the CSV file Python. If you don’t have them, you should determine your own keys by setting the fieldnames (optional parameter).

Python CSV Reader Parameters

In the csv library, the reader object can deal with different types of CSV files in Python by defining additional parameters. For example:

 

  • delimiter identifies the character that is used to separate each field. The comma is a default separator.
  • quotechar identifies the character that is used to surround areas that contain a delimiter character. A double quote is a default quotechar.
  • escapechar identifies the character that is used to escape the delimiter character, when there are no quotes. The default escapechar is no escape character. The following examples will help to better understand these parameters.
name,birthday
Lochana Cleitus,April 16,1995
Aberash Juliya,March 8,1999
Yunuen Walenty,January 3,1996

There are two fields name and birthday in this Python 3 CSV file, which are separated by commas. The data in the birthday field also contains a comma to signify the year. So we can’t use the default separator (comma).

 

  • Write the data in quotes

In quoted strings, the nature of your chosen delimiter is ignored. If you want to use different characters for quoting the quotechar optional parameter will help you with this.

  • Use a different delimiter

You can use the delimiter optional parameter to identify the new delimiter. In this case, the comma can be used in the data.

  • Escape the delimiter characters

To use escape characters, you must identify them using the escapechar optional parameter.

How to Write CSV Files by Using the CSV Module Python

Python CSV writer object and the .write_row() method can help you to write data to a CSV file. Here we also unfold the file with open csv python function.

import csv
with open('data.csv', mode='w') as data:
birthday_data = csv.writer(data, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
birthday_data.writerow(['Andrus Mile', '26', 'September', '1999'])
birthday_data.writerow(['Sofia Rangi', '11', 'July', '1997'])

The quotechar optional parameter defines which symbol to use in quote fields. No difference in using quoting or not, however, it is decided by the quoting parameter:

 

  • csv.QUOTE_MINIMAL

On this occasion .writerow() will quote fields which contain the delimiter or the quotechar. This quoting is a default case.

  • csv.QUOTE_NONE

On this occasion .writerow() won’t quote fields. Instead of this .writerow() function will escape delimiters. There you also must provide a value for the escapechar optional parameter.

  • csv.QUOTE_NONNUMERIC

On this occasion .writerow() will quote fields which contain text data and convert numeric fields to the float data type.

  • csv.QUOTE_ALL

On this occasion .writerow() will quote all fields.

 

So Python save to CSV file following data:

Andrus Mile,26,September
Sofia Rangi,11,July

Python write CSV File Into a Dictionary

You should be able to write data out from a dictionary. For this you need to use  csv.DictWriter and the following two methods .writeheader() (is used to write a row of column field names to the CSV program) and .writerow() (is used to write a single row of data into the file).

Here is Python csv DictWriter example:

import csv
with open('data.csv', mode='w') as csv_file:
fieldnames = ['name', 'birthday_day', 'birthday_month', 'birthday_year']
writer = csv.DictWriter(csv_file, fieldnames=fields)
writer.writeheader()
writer.writerow({'name': 'Lochana Cleitus', 'birthday_day': '16', 'birthday_month': 'April', 'birthday_year': '1995'})
writer.writerow({'name': 'Aberash Juliya', 'birthday_day': '8', 'birthday_month': 'March', 'birthday_year': '1999'})
writer.writerow({'name': 'Yunuen Walenty', 'birthday_day': '3', 'birthday_month': 'January', 'birthday_year': '1996'})

This code makes a file save as CSV Python:

name,birthday_day,birthday_month,birthday_year
Lochana Cleitus,16,April,1995
Aberash Juliya,8,March,1999
Yunuen Walenty,3,January,1996

CSV pandas

One more open-source Python library that can read and write CSV files is pandas. People often use pandas if their CSV files are huge to analyze.

pandas provides a lot of high-performance tools that can analyze data and also pandas provides easy-to-use data structures.

You can implement the pandas library in PyCharm as all other libraries, but the best option to work with it is using Anaconda’s Jupyter notebooks.

What is Anaconda? The in-built Python library has a lot of methods but there are a lot of other methods that are more useful and not included in it. That is where Anaconda distribution is useful. It is a free, open-source platform that allows you to execute code.

We need to talk about Jupyter notebook more. This application is web based so it allows you to use all the libraries that are included in Anaconda in your default web browser. Also there are useful features like: titles, comments and another text that you can write beyond the code.

 

So why do we need a Jupyter notebook? Because pandas is available in it as an in-built library and it works extremely well in Jupyter Notebook. You can share code, analyze results, and see graphs made in it by CSV. And also it is very easy to use it here.

Read CSV Files by pandas

Firstly, to try working with pandas we need to create a CSV file to work with. So let’s create one:

Name,Fire date,Salary,Sick Days remaining
Gioia Kellan,20/1/2004,500,10
Mieszko Ailis,10/1/2005,650,8
Tamara Prasad,10/2/2010,450,10
Terry Jones,2/10/2021,700,3
Dorotheos Caelestis,30/12/2022,480,7

The example of code will look like that:

import pandas as pd
example_data = pd.read_csv('data.csv')
print(example_data)
#Output:
# Name Fire date Salary Sick Days remaining
#0 Gioia Kellan 20/1/2004 500 10
#1 Mieszko Ailis 10/1/2005 650 8
#2 Tamara Prasad 10/2/2010 450 10
#3 Terry Jones 2/10/2021 700 3
#4 Dorotheos Caelestis 30/12/2022 480 7

We have operated with ‘import’ to show that we use the panda’s library and also we used ‘as’ for making it shorter in code. Anyway, that means that only 3 lines of code are needed to make our CSV file readable for us. ‘read_csv()’ operation reads. So here pandas have read the first line of our CSV file and used them in the output correctly.

The only note is that our ‘Fire date’ is the string type. We can check all our types by using this code:

import pandas as pd
example_data = pd.read_csv('data.csv')
print(type(example_data['Fire date'][0]))
#Output:
#<class 'str'>

parse_dates=[]

So we need to fix this. Let’s try to fix this issue. The best way is to use ‘parse_dates’ the optional parameter that convinces pandas to turn things into datetime types. This parameter receives only a list of columns(or multiple columns).

import pandas as pd
example_data = pd.read_csv('data.csv', parse_dates=['Fire date'])
print(example_data)
#Output:
# Name Fire date Salary Sick Days remaining
#0 Gioia Kellan 2004-01-20 500 10
#1 Mieszko Ailis 2005-10-01 650 8
#2 Tamara Prasad 2010-10-02 450 10
#3 Terry Jones 2021-02-10 700 3
#4 Dorotheos Caelestis 2022-12-30 480 7

Also if you use Jupyter notebook you can see this message after using ‘parse_dates’:

UserWarning: Parsing ’30/12/2022′ in DD/MM/YYYY format. Provide format or specify infer_datetime_format=True for consistent parsing.

  return tools.to_datetime(

So here Anaconda helps us that we can use different formats for editing our dataframe.

After our example let`s check the data type of our ‘Fire date’:

import pandas as pd
example_data = pd.read_csv('data.csv')
print(type(example_data['Fire date'][0]))
#Output:
#<class 'pandas._libs.tslibs.timestamps.Timestamp'>

So here Anaconda helps us that we can use different formats for editing our dataframe.

After our example let`s check the data type of our ‘Fire date’:

import pandas as pd
example_data = pd.read_csv('data.csv')
print(type(example_data['Fire date'][0]))
#Output:
#<class 'pandas._libs.tslibs.timestamps.Timestamp'>

And here we see that our column is formatted properly. Hooray!

index_col=””

Also, we see here that we have numbers on the left. But if we don’t need them, we can remove them.

The only way to do that is using an optional parameter called ‘index_col’. Index_col allows us to set which cols will be used as the index in our dataframe. Index_col receives the column name or its index that will be used as index column. 

import pandas as pd
example_data = pd.read_csv('data.csv', index_col='Name', parse_dates=['Fire date'])
print(example_data)
#Output:
# Fire date Salary Sick Days remaining
#Name 
#Gioia Kellan 2004-01-20 500 10
#Mieszko Ailis 2005-10-01 650 8
#Tamara Prasad 2010-10-02 450 10
#Terry Jones 2021-02-10 700 3
#Dorotheos Caelestis 2022-12-30 480 7

If your CSV files do not contain column names on the first line, there is a way to provide it by the ‘names’ optional parameter. It is also used when you want to replace the column names specified in the first line. In this case, you must also use ‘pandas.read_csv()’ to ignore existing column names with the optional ‘header=0’ parameter.
Let’s take this CSV file:

Hello, my, name, is
Gioia Kellan,20/1/2004,500,10
Mieszko Ailis,10/1/2005,650,8
Tamara Prasad,10/2/2010,450,10
Terry Jones,2/10/2021,700,3
Dorotheos Caelestis,30/12/2022,480,7

Here we have seen, we have the wrong first line. We can change it as in next example:

import pandas as pd
example_data = pd.read_csv('data.csv', 
index_col='Name', 
parse_dates=['Fire date'], 
header=0, 
names=['Name', 'Fire date','Salary', 'Sick Days remaining'])
print(example_data)
#Output:
# Fire date Salary Sick Days remaining
#Name 
#Gioia Kellan 2004-01-20 500 10
#Mieszko Ailis 2005-10-01 650 8
#Tamara Prasad 2010-10-02 450 10
#Terry Jones 2021-02-10 700 3
#Dorotheos Caelestis 2022-12-30 480 7

But take note that if we change our first line, we must take care of our ‘index_col’ and ‘parse_dates’ too!

How to Write CSV Python by pandas

So after getting knowed how to read CSV files by pandas, here you can Python write to CSV. It is much easier now. Let’s see the example:

import pandas as pd
example_data = pd.read_csv('data.csv', 
index_col='Name', 
parse_dates=['Fire date'], 
header=0, 
names=['Name', 'Fire date','Salary', 'Sick Days remaining'])
example_data.to_csv('data2.csv')

 Here we got a new function ‘to_csv’. All it does is that it creates the new file called as you named in quotes and saves the data to CSV python. And the data2.csv will be:

Hello, my, name, is
Gioia Kellan,20/1/2004,500,10
Mieszko Ailis,10/1/2005,650,8
Tamara Prasad,10/2/2010,450,10
Terry Jones,2/10/2021,700,3
Dorotheos Caelestis,30/12/2022,480,7

We got a copy of the file ‘data.csv’. So if you need to use ‘print’ as the writer for the new file you can use ‘to_csv’ for this.

How to Save CSV Python by pandas

To save CSV file Python to a folder you need to import one more package os.path.

import pandas as pd
import os.path
example_data.to_csv(os.path.join('folder',data.csv'))

How to Convert Python CSV to List by pandas

To convert a CSV file into a list you need to read the file using read_csv and convert it into a dataframe. Then you should convert each row into a list. Here is an example:

name,job
Sutekh Piritta, doctor
Bradley Tamari, teacher
Artur Jocasta, vet
import pandas as pd
example_data = pd.read_csv('data.csv', delimiter=',')
csv_list = [list(row) for row in example_data.values]
print(csv_list)
#Output:
#[['Sutekh Piritta', ' doctor'], ['Bradley Tamari', ' teacher'], ['Artur Jocasta', ' vet']]