reading-notes

View on GitHub

pandas

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Add pandas library first time :

Create data frame using : Series : s = pd.Series([1, 3, 5, np.nan, 6, 8]) NumPy array : df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))

To check the data type for each column : df.dtypes

to view the top and bottom rows of the frame: For top : df.head() , for bottom :df.tail() You can use int argument to select limit to show data : df.tail(3)

Display the index, columns: df.index , df.columns

Transposing the rows in the columns place : df.T Selecting a single column : df["A"] Selecting specific amount of rows : df[0:3] Select row via the position of the passed integers: df.iloc[3] By integer slices rows and columns : df.iloc[3:5, 0:2]

Setting a new column automatically aligns the data by the indexes : 1 – create new column : s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range("20130102", periods=6)) 2 – add new column to the data frame : df["F"] = s1

Setting a new column automatically aligns the data by the indexes : 1 – create new column : s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range("20130102", periods=6)) 2 – add new column to the data frame : df["F"] = s1

pandas primarily uses the value np.nan to represent missing data.

Operations df.mean() : calculate the average of all columns Applying functions to the data: df.apply(function())