You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
chipo.describe() #Notice: by default, only the numeric columns are returned. chipo.describe(include="all") #Notice: By default, only the numeric columns are returned.
dtype : Return data type of specific column
df.col_name.dtype return the data type of that column
df.item_price.dtype#'O' (Python) objects
Please note: dtype will return below special character
loc: is label-based, which means that we have to specify the "name of the rows and columns" that we need to filter out.
Find all the rows based on 1 or more conditions in a column
# select all rows with a conditiondata.loc[data.age>=15]
# select all rows with multiple conditionsdata.loc[(data.age>=12) & (data.gender=='M')]
Select only required columns with conditions
# Update the values of multiple columns on selected rowschipo.loc[(chipo.quantity==7) & (chipo.item_name=='Bottled Water'), ['item_name', 'item_price']] = ['Tra Xanh', 0]
# Select only required columns with a conditionchipo.loc[(chipo.quantity>5), ['item_name', 'quantity', 'item_price']]
iloc
iloc: is index-based, which means that we have to specify the "integer index-based" that we need to filter out.
.iloc[] allowed inputs are:
Selecting Rows
An integer, e.g. dataset.iloc[0] > return row 0 in <class 'pandas.core.series.Series'>
CountryFranceAge44Salary72000PurchasedNo
A list or array of integers, e.g.dataset.iloc[[0]] > return row 0 in DataFrame format
CountryAgeSalaryPurchased0France44.072000.0No
A slice object with ints, e.g. dataset.iloc[:3] > return row 0 up to row 3 in DataFrame format
DataFrame.values: Return a Numpy representation of the DataFrame (i.e: Only the values in the DataFrame will be returned, the axes labels will be removed)
# Counting how many values in the columndf.col_name.count()
# Take the mean of values in the columndf["col_name"].mean()
value_counts() : Return a Series containing counts of unique values
index=pd.Index([3, 1, 2, 3, 4, np.nan])
#dropna=False will also consider NaN as a unique value index.value_counts(dropna=False)
#Return: 3.022.01NaN14.011.01dtype: int64
Calculate total unique values in a columns
#How many unique values index.value_counts().count()
index.nunique()
#5