Several cheat-sheets of different topics in .md format. Checkout the Github pages version.
Import an excel file as a pandas dataframe:
import pandas as pd
path = './path_to_file'
df = pd.read_excel(path)
Export an existing dataframe df
to an excel file:
export_excel = df.to_excel(path_to_exported_file, index=None, header=True)
Given an existing dataframe df
, we can loop over the rows and access specific columns as follows:
for index, row in df.iterrows():
CUSTOMER_NAME = str(row['CUSTOMER NAME'])
DATE_OF_BIRTH = str(row['DATE OF BIRTH'])
...
Given an existing dataframe
for column in df.columns:
for index, row in df.iterrows():
df.loc[index, column] = row[column].replace("foo", "bar")
Given an existing dataframe df
and a condition, the following code filters the rows for which the condition is true:
is_it_satisfied = df["Recommendation based on Requirement"] == "yes"
df_requiredRows = df[is_it_satisfied]
...
l_years = []
for index, row in df.iterrows():
years = row['batch_ts'].year - int(row['year'])
l_years.append(years)
df['years'] = l_years
...
df = df.drop('year', axis=1)
Filter all rows which column contains a word or some text:
df[df['col_name'].apply(lambda x: True if 'word' in x.lower().split(' ') else False)]
df[df['col_name'].apply(lambda x: True if 'substring' in x.lower() else False)]
df.groupby('JobTitle')['Id'].count().sort_values(ascending=False)[0:5]
df[['col_a', 'col_b']].corr()
from pandarallel import pandarallel
pandarallel.initialize(nb_workers=10, progress_bar=True)
df['attributes'] = df['response'].parallel_apply(parse_attributes)
Return to main page