-
Notifications
You must be signed in to change notification settings - Fork 2
python_multiprocessing
Hyeonwoo Daniel Yoo edited this page Sep 29, 2019
·
9 revisions
- Parallel Processing in Python
- Progress Bars in Python
- Multiprocesing : use tqdm to display a progress bar
$ sudo apt-get install htop
$ pip3 install tqdm
type command below to check how many of your cores are working
$ htop
Works must be distributed to each core
- I'd like to draw scatter plot of every possible combination of columns in the dataframe
- Without parallel processing, only one of the cores will work to draw scatter plot, from first combination to last combination
- Which is desired is, each core works to draw a scatterplot of different combination of columns all together, at the same time
- During this process, I'd also like to know the progress and how much time is remaining
from tqdm import tqdm
import matplotlib.pyplot as plt
# Set combinations of arguments desired to be parallel-processed
args = []
for col1 in df.columns :
for col2 in df.columns :
args.append((col1, col2))
# Define function to process
def save_scatter_matrix(col1, col2) :
plt.figure(figsize=(20, 20))
plt.scatter(df[col1], df[col2])
plt.xlabel(col1)
plt.ylabel(col2)
plt.xticks(rotation=90)
plt.savefig('img/scatter_matrix/%s_%s.png'%(col1, col2))
pbar.update()
return
# Number of cores
num_of_cores = 16
# Assign works to cores
pool = ThreadPool(num_of_cores)
with tqdm(total=len(args)) as pbar:
for i in range(len(args)):
pool.apply_async(save_scatter_matrix, args[i])
pool.close()
pool.join()
- with tqdm with proper code, you can check the progress and how much time is remaining to finish the job