Skip to content

Commit

Permalink
split benchmark file in chunks
Browse files Browse the repository at this point in the history
  • Loading branch information
cbouy committed Mar 20, 2022
1 parent f45ce4e commit 7cc1da5
Showing 1 changed file with 18 additions and 0 deletions.
18 changes: 18 additions & 0 deletions scripts/split_in_chunks.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash

IN_FILE="chembl_processed_unique.smi.gz"
DIR=$PWD

# go to data dir
cd $(dirname "$0")/../data

echo Splitting processed file in chunks of 200,000 lines
mkdir -p chunks
zcat ${IN_FILE} | split -d -a 1 --additional-suffix .smi -l 200000 - chunks/part

# counts
cd chunks
wc -l *.smi | grep .smi | awk '{print $1 > ".count_"$2}'

cd $DIR

0 comments on commit 7cc1da5

Please sign in to comment.