The CSV (Comma Separated Values) format is quite popular for storing data. A large number of datasets are present every bit CSV files which can exist used either straight in a spreadsheet software like Excel or tin can be loaded up in programming languages like R or Python. Pandas dataframes are quite powerful for handling ii-dimensional tabular data. In this tutorial, we'll look at how to read a csv file as a pandas dataframe in python.
How to read csv files in python using pandas?
The pandas read_csv()
function is used to read a CSV file into a dataframe. Information technology comes with a number of different parameters to customize how you'd like to read the file. The following is the general syntax for loading a csv file to a dataframe:
import pandas every bit pd df = pd.read_csv(path_to_file)
Hither, path_to_file
is the path to the CSV file yous want to load. It tin be any valid string path or a URL (run into the examples below). It returns a pandas dataframe. Let'southward look at some of the different employ-cases of the read_csv()
part through examples –
Examples
Earlier we proceed, allow's get a sample CSV file that we'd exist using throughout this tutorial. We'll be using the Iris dataset which you can download from Kaggle. Hither'southward a snapshot of how it looks when opened in excel:
one. Read CSV from its location on your machine
To read a CSV file locally stored on your automobile pass the path to the file to the read_csv()
role. You can laissez passer a relative path, that is, the path with respect to your electric current working directory or yous tin laissez passer an accented path.
# read csv using relative path import pandas as pd df = pd.read_csv('Iris.csv') impress(df.caput())
Output:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 0 1 5.1 iii.5 1.four 0.2 Iris-setosa 1 2 four.nine 3.0 ane.4 0.2 Iris-setosa ii iii 4.7 three.ii 1.three 0.2 Iris-setosa iii 4 4.half-dozen 3.i 1.5 0.ii Iris-setosa 4 5 5.0 3.6 1.4 0.two Iris-setosa
In the in a higher place example, the CSV file Iris.csv
is loaded from its location using a relative path. Here, the file is nowadays in the current working directory. You tin besides read a CSV file from its absolute path. Encounter the example beneath:
# read csv using absolute path import pandas as pd df = pd.read_csv(r"C:\Users\piyush\Downloads\Iris.csv") impress(df.head())
Output:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 0 1 5.i 3.v 1.4 0.ii Iris-setosa 1 2 4.ix iii.0 one.4 0.2 Iris-setosa 2 3 4.seven 3.2 1.three 0.2 Iris-setosa 3 4 4.6 three.1 1.5 0.two Iris-setosa 4 5 five.0 three.six one.four 0.2 Iris-setosa
Here, the aforementioned CSV file is read from its accented path.
ii. Read CSV from a URL
Yous tin can also read a CSV file from its URL. Pass the URL to the read_csv()
function and it'll read the corresponding file to a dataframe. The Iris dataset can also be downloaded from the UCI Machine Learning Repository. Let'south use their dataset download URL to read it as a dataframe.
import pandas as pd df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data") df.caput()
Output:
5.1 3.v 1.4 0.2 Iris-setosa 0 4.9 iii.0 1.4 0.2 Iris-setosa 1 4.vii three.2 1.3 0.two Iris-setosa 2 iv.half-dozen iii.1 1.5 0.2 Iris-setosa 3 v.0 3.half dozen ane.iv 0.ii Iris-setosa 4 5.4 iii.nine ane.seven 0.4 Iris-setosa
Yous can see that the read_csv()
office is able to read a dataset from its URL. It is interesting to note that in this item data source, we do not take headers. The read_csv()
part infers the header past default and here uses the first row of the dataset every bit the header.
3. Read a CSV file without a header
In the above case, you lot saw that if the dataset does not have a header, the read_csv()
function infers it by itself and uses the starting time row of the dataset every bit the header. Yous can change this behavior through the header
parameter, laissez passer None
if your dataset does not have a header. Yous tin also pass a custom listing of integers as a header.
import pandas every bit pd df = pd.read_csv("https://archive.ics.uci.edu/ml/car-learning-databases/iris/iris.data", header=None) df.caput()
Output:
0 one 2 3 4 0 5.ane three.5 1.four 0.2 Iris-setosa one iv.9 3.0 1.4 0.two Iris-setosa 2 4.7 3.2 1.three 0.2 Iris-setosa 3 4.6 three.one 1.v 0.two Iris-setosa 4 v.0 3.half dozen 1.4 0.2 Iris-setosa
In the above instance, we pass header=None
to the read_csv()
office since the dataset did non have a header.
4. Read a CSV file and give custom column names
You tin requite custom column names to your dataframe when reading a CSV file using the read_csv()
function. Pass your custom cavalcade names as a list to the names
parameter.
import pandas as pd df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data", names = ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species']) impress(df.head())
Output:
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 0 five.i 3.5 i.4 0.2 Iris-setosa i 4.ix 3.0 one.iv 0.2 Iris-setosa ii 4.7 3.2 1.3 0.ii Iris-setosa 3 four.6 iii.1 1.5 0.2 Iris-setosa 4 5.0 three.6 i.4 0.2 Iris-setosa
5. Read CSV with a cavalcade as index
You lot tin also utilize a column every bit the row labels of the dataframe. Pass the column name to the index_col
parameter. Going back to the Iris.csv
we downloaded from Kaggle. Here, we use the Id
columns as the dataframe index.
# read csv with a column as index import pandas as pd df = pd.read_csv('Iris.csv', index_col='Id') print(df.head())
Output:
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species Id 1 v.i three.5 1.4 0.2 Iris-setosa 2 4.9 3.0 ane.four 0.2 Iris-setosa 3 4.7 three.2 i.iii 0.2 Iris-setosa 4 4.vi three.1 one.five 0.ii Iris-setosa 5 5.0 3.half-dozen 1.4 0.2 Iris-setosa
In the above instance, yous can see that the Id
column is used every bit the row alphabetize of the dataframe df
. You can also pass multiple columns as list to the index_col
parameter to be used as row index.
6. Read just a subset of columns of a CSV
You can likewise specify the subset of columns to read from the dataset. Pass the subset of columns you want as a listing to the usecols
parameter. For case, let's read all the columns from Iris.csv
except Id
.
# read csv with a cavalcade as index import pandas as pd df = pd.read_csv('Iris.csv', usecols=['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species']) print(df.head())
Output:
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 0 5.i 3.5 i.4 0.2 Iris-setosa i 4.ix 3.0 i.4 0.2 Iris-setosa ii 4.7 three.2 1.iii 0.2 Iris-setosa 3 4.6 3.1 one.5 0.two Iris-setosa 4 5.0 3.vi 1.4 0.2 Iris-setosa
In the in a higher place example, the returned dataframe does not have an Id
column.
7. Read simply the offset n rows of a CSV
You tin besides specify the number of rows of a file to read using the nrows
parameter to the read_csv()
function. Particularly useful when you lot desire to read a small segment of a large file.
# read csv with a column as alphabetize import pandas as pd df = pd.read_csv('Iris.csv', nrows=iii) impress(df.head())
Output:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 0 ane v.1 iii.5 ane.4 0.two Iris-setosa 1 2 four.9 3.0 1.4 0.2 Iris-setosa two 3 four.7 iii.2 1.3 0.2 Iris-setosa
In the to a higher place example, we read only the commencement three rows of the file Iris.csv
.
These are only some of the things you lot tin can do when reading a CSV file to dataframe. Pandas dataframes besides provide a number of useful features to manipulate the information once the dataframe has been created.
With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5
Subscribe to our newsletter for more informative guides and tutorials.
We practice non spam and y'all can opt out any time.
Read Csv File From S3 Python Import to Dictionary
Source: https://datascienceparichay.com/article/read-csv-files-using-pandas-with-examples/
Komentar
Posting Komentar