Skip to content

python_multiprocessing

Hyeonwoo Daniel Yoo edited this page Sep 29, 2019 · 9 revisions

Multiprocessing with Progress Bar

References

Install

$ sudo apt-get install htop
$ pip3 install tqdm

When only one of your cores is working so hard.. just like you in your team

type command below to check how many of your cores are working

$ htop

Works must be distributed to each core

Example

  • I'd like to draw scatter plot of every possible combination of columns in the dataframe
  • Without parallel processing, only one of the cores will work to draw scatter plot, from first combination to last combination
  • Which is desired is, each core works to draw a scatterplot of different combination of columns all together, at the same time
  • During this process, I'd also like to know the progress and how much time is remaining
from tqdm import tqdm
import matplotlib.pyplot as plt

# Set combinations of arguments desired to be parallel-processed
args = []
for col1 in df.columns :
    for col2 in df.columns :
        args.append((col1, col2))

# Define function to process
def save_scatter_matrix(col1, col2) :
    plt.figure(figsize=(20, 20))
    plt.scatter(df[col1], df[col2])
    plt.xlabel(col1)
    plt.ylabel(col2)
    plt.xticks(rotation=90)
    plt.savefig('img/scatter_matrix/%s_%s.png'%(col1, col2))
    pbar.update()

    return

# Number of cores
num_of_cores = 16

# Assign works to cores
pool = ThreadPool(num_of_cores)
with tqdm(total=len(args)) as pbar:
    for i in range(len(args)):
        pool.apply_async(save_scatter_matrix, args[i])
    pool.close()
    pool.join()

Now everyone is unhappy. Just like you :)

With Progress Bar

  • with tqdm with proper code, you can check the progress and how much time is remaining to finish the job
Clone this wiki locally