Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2.1.0 from develop. #239

Merged
merged 40 commits into from
Apr 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
16554b9
Bumped new version and included mail_password into yaml for bioinfo-doc
Shettland Mar 5, 2024
28030e2
Included blast template
Shettland Mar 5, 2024
0a02b2a
Included default CC and option to use text file for delivery notes
Shettland Mar 5, 2024
d173457
Bioinfo-doc updates; linting
Shettland Mar 5, 2024
be88b6b
Updated CHANGELOG.md
Shettland Mar 5, 2024
7785353
Updated CHANGELOG.md. v2
Shettland Mar 5, 2024
e540731
Updated CHANGELOG.md. v3
Shettland Mar 5, 2024
5302f61
Small fix in autoclean_sftp, dflt missing
Shettland Mar 11, 2024
692fea9
Small fix to correctly attach multiple CCs
Shettland Mar 11, 2024
5161295
Included new user in sftp_user.json
Shettland Mar 11, 2024
2a742f9
excel_generator.py reverted, argument -l included
Shettland Mar 11, 2024
7c63796
Adapted viralrecon_results lablog to new excel_generator, also commen…
Shettland Mar 11, 2024
6d879d2
Updated changelog
Shettland Mar 11, 2024
c746d4f
excel_generator.py reverted, -l included. Linting
Shettland Mar 11, 2024
313d117
updated regex
Daniel-VM Mar 15, 2024
dfe3e6a
added missing url
Daniel-VM Mar 15, 2024
308c593
update change log
Daniel-VM Mar 15, 2024
f7cbfc3
Removed empty strings from folders to clean as they were leading to t…
Shettland Mar 20, 2024
0eff368
Fixed a missing sed in _03_post_processing step
Shettland Mar 20, 2024
e955a25
Included line to create a summary of flu types from irma_stats.txt
Shettland Mar 20, 2024
588f1d1
Fixed missing strings in services.json, updated CHANGELOG
Shettland Mar 20, 2024
269cea2
Updated IRMA location and reduced threads to 16
Shettland Mar 22, 2024
aa4e1ec
Updated IRMA location and reduced threads to 16. Updated CHANGELOG
Shettland Mar 22, 2024
640fe87
Updated IRMA version. Updated CHANGELOG
Shettland Mar 25, 2024
db77078
IRMA: Create C-B dirs only if those flu types are present
Shettland Mar 27, 2024
b4cc501
Fixed singularity mount problem in nextflow v23.10.0
Shettland Mar 27, 2024
01c6d81
Commented auxiliar code
Shettland Mar 27, 2024
b93da68
Fixed singularity mount problem in nextflow v23.10.0 in viralrecon te…
Shettland Mar 27, 2024
8641e7a
Fixed paths to trimmed reads
svarona Apr 9, 2024
e0a87be
added ampersand to srun
svarona Apr 9, 2024
5277c3f
added changes to changelog
svarona Apr 9, 2024
b7589c0
Fixed lablog
svarona Apr 12, 2024
69bfbac
Updated resuls lablog
svarona Apr 12, 2024
e3e5c6c
Added line to process mlst results
svarona Apr 12, 2024
2babe70
Removed unnecesaaaary readsmes
svarona Apr 12, 2024
730bb69
Added code to download pubmlst reference
svarona Apr 12, 2024
a1683b3
Added code for pubmlst process with ariba
svarona Apr 12, 2024
8016898
created chewbbaca template
svarona Apr 12, 2024
7931f3d
Prepare release 2.1.0
Shettland Apr 19, 2024
44e0bad
Prepare Release 2.1.0. Included reviewer changes
Shettland Apr 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 46 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [2.1.0dev] - 2024-0X-0X : https://github.com/BU-ISCIII/buisciii-tools/releases/tag/2.1.X
## [2.2.Xdev] - 2024-0X-XX : https://github.com/BU-ISCIII/buisciii-tools/releases/tag/2.2.X

### Credits

Code contributions to the hotfix:
Code contributions to the new version:

### Template fixes and updates

Expand Down Expand Up @@ -44,6 +44,50 @@ Code contributions to the hotfix:

### Requirements

## [2.1.0] - 2024-04-19 : https://github.com/BU-ISCIII/buisciii-tools/releases/tag/2.1.0

### Credits

Code contributions to the new version:
- [Sarai Varona](https://github.com/svarona)
- [Pablo Mata](https://github.com/Shettland)
- [Daniel Valle](https://github.com/Daniel-VM)

### Template fixes and updates

- Added blast_nt template to services.json [#208](https://github.com/BU-ISCIII/buisciii-tools/pull/208)
- Included new user to sftp_user.json
- Included a missing sed inside IRMA's 04-irma/lablog [#213](https://github.com/BU-ISCIII/buisciii-tools/pull/213)
- Changed singularity mount options in Viralrecon template to fix errors with Nextflow v23.10.0
- excel_generator.py reverted to last state, now lineage tables are merged when argument -l is given
- Adapted viralrecon_results lablog to new excel_generator.py argument
- IRMA/RESULTS now creates a summary of the different types of flu found in irma_stats.txt
- Updated IRMA to v1.1.4 date 02-2024 and reduced threads to 16
- IRMA 04-irma/lablog now creates B and C dirs only if those flu-types are present
- Fixed characterization template [#220](https://github.com/BU-ISCIII/buisciii-tools/pull/220)
- Created Chewbbaca template [#230](https://github.com/BU-ISCIII/buisciii-tools/pull/230)

### Modules

#### Added enhancements

- [#207](https://github.com/BU-ISCIII/buisciii-tools/pull/207) - Bioinfo-doc updates: email password can be given in buisciii_config.yml and delivery notes in a text file

#### Fixes

- Added missing url for service assembly_annotation in module list
- Autoclean-sftp refined folder name parsing with regex label adjustment
- Autoclean_sftp does not crash anymore. New argument from 'utils.prompt_yn_question()' in v2.0.0 was missing: 'dflt'
- Bioinfo-doc now sends email correctly to multiple CCs

#### Changed

#### Removed

- Removed empty strings from services.json

### Requirements

## [2.0.0] - 2024-03-01 : https://github.com/BU-ISCIII/buisciii-tools/releases/tag/2.0.0

### Credits
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ Output:
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Service name ┃ Description ┃ Github ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ assembly_annotation │ Nextflow assembly pipeline to assemble │
│ assembly_annotation │ Nextflow assembly pipeline to assemble │ https://github.com/Daniel-VM/bacass/...
│ │ bacterial genomes │ │
│ mtbseq_assembly │ Mycobacterium tuberculosis mapping, │ https://github.com/ngs-fzb/MTBseq_source │
│ │ variant calling and detection of │ │
Expand Down
6 changes: 4 additions & 2 deletions bu_isciii/__main__.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def run_bu_isciii():
)

# stderr.print("[green] `._,._,'\n", highlight=False)
__version__ = "1.0.1"
__version__ = "2.0.0"
stderr.print(
"[grey39] BU-ISCIII-tools version {}".format(__version__), highlight=False
)
Expand Down Expand Up @@ -507,6 +507,7 @@ def bioinfo_doc(
"""
Create the folder documentation structure in bioinfo_doc server
"""
email_pass = email_psswd if email_psswd else ctx.obj.get("email_password")
new_doc = bu_isciii.bioinfo_doc.BioinfoDoc(
type,
resolution,
Expand All @@ -517,7 +518,7 @@ def bioinfo_doc(
results_md,
ctx.obj["api_user"],
ctx.obj["api_password"],
email_psswd,
email_pass,
)
new_doc.create_documentation()

Expand Down Expand Up @@ -564,6 +565,7 @@ def bioinfo_doc(
default=None,
help="Tsv output path + filename with archive stats and info",
)
@click.pass_context
def archive(
ctx,
service_id,
Expand Down
10 changes: 7 additions & 3 deletions bu_isciii/autoclean_sftp.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,9 @@ class AutoremoveSftpService:
def __init__(self, path=None, days=14):
# Parse input path
if path is None:
use_default = bu_isciii.utils.prompt_yn_question("Use default path?: ")
use_default = bu_isciii.utils.prompt_yn_question(
"Use default path?: ", dflt=False
)
if use_default:
data_path = bu_isciii.config_json.ConfigJson().get_configuration(
"global"
Expand Down Expand Up @@ -107,7 +109,7 @@ def check_path_exists(self):
def get_sftp_services(self):
self.sftp_services = {} # {sftp-service_path : last_update}
service_pattern = (
r"^[SRV][A-Z]+[0-9]+_\d{8}_[A-Z0-9]+_[a-zA-Z]+(?:\.[a-zA-Z]+)?_[a-zA-Z]$"
r"^[SRV][A-Z]+[0-9]+_\d{8}_[A-Z0-9.-]+_[a-zA-Z]+(?:\.[a-zA-Z]+)?_[a-zA-Z]$"
)

stderr.print("[blue]Scanning " + self.path + "...")
Expand Down Expand Up @@ -149,7 +151,9 @@ def remove_oldservice(self):
"The following services are going to be deleted from the sftp:\n"
+ service_elements
)
confirm_sftp_delete = bu_isciii.utils.prompt_yn_question("Are you sure?: ")
confirm_sftp_delete = bu_isciii.utils.prompt_yn_question(
"Are you sure?: ", dflt=False
)
if confirm_sftp_delete:
for service in self.marked_services:
try:
Expand Down
61 changes: 45 additions & 16 deletions bu_isciii/bioinfo_doc.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -262,13 +262,34 @@ def create_structure(self):
return

def post_delivery_info(self):
delivery_notes = bu_isciii.utils.ask_for_some_text(
msg="Write some delivery notes:"
)
if bu_isciii.utils.prompt_yn_question(
msg="Do you wish to provide a text file for delivery notes?", dflt=False
):
for i in range(3, -1, -1):
self.provided_txt = bu_isciii.utils.prompt_path(
msg="Write the path to the file with RAW text as delivery notes"
)
if not os.path.isfile(os.path.expanduser(self.provided_txt)):
stderr.print(f"Provided file doesn't exist. Attempts left: {i}")
else:
stderr.print(f"File selected: {self.provided_txt}")
break
else:
stderr.print("No more attempts. Delivery notes will be given by prompt")
self.provided_txt = None
else:
self.provided_txt = None

if self.provided_txt:
with open(os.path.expanduser(self.provided_txt)) as f:
self.delivery_notes = " ".join([x.strip() for x in f.readlines()])
else:
self.delivery_notes = bu_isciii.utils.ask_for_some_text(
msg="Write some delivery notes:"
)
delivery_dict = {
"resolution_number": self.resolution_id,
"delivery_notes": delivery_notes,
"delivery_notes": self.delivery_notes,
}

# How json should be fully formatted:
Expand Down Expand Up @@ -568,9 +589,15 @@ def email_creation(self):
if bu_isciii.utils.prompt_yn_question(
"Do you want to add some delivery notes to the e-mail?", dflt=False
):
email_data["email_notes"] = bu_isciii.utils.ask_for_some_text(
msg="Write email notes"
)
if self.provided_txt:
if bu_isciii.utils.prompt_yn_question(
f"Do you want to use notes from {self.provided_txt}?", dflt=False
):
email_data["email_notes"] = self.delivery_notes
else:
email_data["email_notes"] = bu_isciii.utils.ask_for_some_text(
msg="Write email notes"
)

email_data["user_data"] = self.resolution_info["service_user_id"]
email_data["service_id"] = self.service_name.split("_", 5)[0]
Expand Down Expand Up @@ -604,7 +631,7 @@ def send_email(self, html_text, results_pdf_file):
server.login(user=email_host_user, password=email_host_password)
except Exception as e:
stderr.print("[red] Unable to send e-mail" + e)

default_cc = "[email protected]"
msg = MIMEMultipart("alternative")
msg["To"] = self.resolution_info["service_user_id"]["email"]
msg["From"] = email_host_user
Expand All @@ -617,18 +644,21 @@ def send_email(self, html_text, results_pdf_file):
+ self.service_name.split("_", 5)[2]
)
if bu_isciii.utils.prompt_yn_question(
"Do you want to add any other sender? appart from "
+ self.resolution_info["service_user_id"]["email"],
"Do you want to add any other sender? apart from %s. Note: %s is the default CC."
% (self.resolution_info["service_user_id"]["email"], default_cc),
dflt=False,
):
stderr.print(
"[red] Write emails to be added in semicolon separated format: bioinformatica@isciii.es;icuesta@isciii.es"
"[red] Write emails to be added in semicolon separated format: icuesta@isciii.es;user2@isciii.es"
)
msg["CC"] = bu_isciii.utils.ask_for_some_text(msg="E-mails:")
rcpt = msg["CC"].split(";") + [msg["To"]]
cc_address = bu_isciii.utils.ask_for_some_text(msg="E-mails:")
else:
rcpt = self.resolution_info["service_user_id"]["email"]

cc_address = str()
if cc_address:
msg["CC"] = str(default_cc + ";" + str(cc_address))
else:
msg["CC"] = default_cc
rcpt = msg["CC"].split(";") + [msg["To"]]
html = MIMEText(html_text, "html")
msg.attach(html)
with open(results_pdf_file, "rb") as f:
Expand All @@ -639,7 +669,6 @@ def send_email(self, html_text, results_pdf_file):
filename=str(os.path.basename(results_pdf_file)),
)
msg.attach(attach)

server.sendmail(
email_host_user,
rcpt,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@ mkdir logs

scratch_dir=$(echo $PWD | sed "s/\/data\/bi\/scratch_tmp/\/scratch/g")

cat ../samples_id.txt | while read in; do echo "srun --partition short_idx --cpus-per-task 32 --mem 35000M --chdir $scratch_dir --time 01:00:00 --output logs/IRMA.${in}.%j.log /data/bi/pipelines/flu-amd/IRMA FLU_AD ../02-preprocessing/${in}/${in}_R1_filtered.fastq.gz ../02-preprocessing/${in}/${in}_R2_filtered.fastq.gz ${in} &"; done > _01_irma.sh
cat ../samples_id.txt | while read in; do echo "srun --partition short_idx --cpus-per-task 16 --mem 35000M --chdir $scratch_dir --time 01:00:00 --output logs/IRMA.${in}.%j.log /data/bi/pipelines/flu-amd-202402/IRMA FLU_AD ../02-preprocessing/${in}/${in}_R1_filtered.fastq.gz ../02-preprocessing/${in}/${in}_R2_filtered.fastq.gz ${in} &"; done > _01_irma.sh

echo 'bash create_irma_stats.sh' > _02_create_stats.sh

echo "ls */*HA*.fasta | cut -d '/' -f2 | cut -d '.' -f1 | sort -u | cut -d '_' -f3 | sed '/^\$/d' | sed 's/^/A_/g' > HA_types.txt" > _03_post_processing.sh

echo "cat HA_types.txt | while read in; do mkdir \${in}; done" >> _03_post_processing.sh

echo "mkdir B" >> _03_post_processing.sh
echo "if grep -qw 'B__' irma_stats.txt; then mkdir B; fi" >> _03_post_processing.sh

echo "mkdir C" >> _03_post_processing.sh
echo "if grep -qw 'C__' irma_stats.txt; then mkdir C; fi" >> _03_post_processing.sh

echo "ls */*.fasta | cut -d '/' -f2 | cut -d '.' -f1 | cut -d '_' -f1,2 | sort -u | grep 'A_' > A_fragment_list.txt" >> _03_post_processing.sh

Expand All @@ -29,7 +29,7 @@ echo 'grep -w 'B__' irma_stats.txt | cut -f1 | while read sample; do cat B_fragm

echo 'grep -w 'C__' irma_stats.txt | cut -f1 | while read sample; do cat C_fragment_list.txt | while read fragment; do if test -f ${sample}/${fragment}*.fasta; then cat ${sample}/${fragment}*.fasta | sed "s/^>/\>${sample}_/g" | sed s/_H1//g | sed s/_H3//g | sed s/_N1//g | sed s/_N2//g | sed s@-@/@g | sed s/_C_/_/g ; fi >> C/${fragment}.txt; done; done' >> _03_post_processing.sh

echo 'cat ../samples_id.txt | while read in; do cat ${in}/*.fasta | sed "s/^>/\>${in}_/g" | sed 's/_H1//g' | sed 's/_H3//g' | sed 's/_N1//g' | sed 's/_N2//g' | sed 's@-@/@g' | 's/_A_/_/g' | sed 's/_B_/_/g' | sed 's/_C_/_/g' >> all_samples_completo.txt; done' >> _03_post_processing.sh
echo 'cat ../samples_id.txt | while read in; do cat ${in}/*.fasta | sed "s/^>/\>${in}_/g" | sed 's/_H1//g' | sed 's/_H3//g' | sed 's/_N1//g' | sed 's/_N2//g' | sed 's@-@/@g' | sed 's/_A_/_/g' | sed 's/_B_/_/g' | sed 's/_C_/_/g' >> all_samples_completo.txt; done' >> _03_post_processing.sh

echo 'sed -i "s/__//g" irma_stats.txt' >> _03_post_processing.sh
echo 'sed -i "s/_\t/\t/g" irma_stats.txt' >> _03_post_processing.sh
echo 'sed "s/__//g" irma_stats.txt > clean_irma_stats.txt' >> _03_post_processing.sh
echo 'sed "s/_\t/\t/g" irma_stats.txt > clean_irma_stats.txt' >> _03_post_processing.sh
3 changes: 2 additions & 1 deletion bu_isciii/templates/IRMA/RESULTS/irma_results
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@ ln -s ../../ANALYSIS/*_MET/99-stats/multiqc_report.html ./krona_results.html
ln -s ../../ANALYSIS/*FLU_IRMA/04-irma/all_samples_completo.txt .
ln -s ../../ANALYSIS/*FLU_IRMA/04-irma/A_H* .
ln -s ../../ANALYSIS/*FLU_IRMA/04-irma/B .
ln -s ../../ANALYSIS/*FLU_IRMA/04-irma/C .
ln -s ../../ANALYSIS/*FLU_IRMA/04-irma/C .
tail -n +2 ../../ANALYSIS/*_FLU_IRMA/04-irma/clean_irma_stats.txt | cut -f4 | sort | uniq -c > flu_type_summary.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# module load fastp
# if assembly pipeline was performed first and the trimmed sequences were saved, this should work:
# cat ../samples_id | xargs -I mkdir @@; cd $_; ln -s ../../*/01-preprocessing/trimmed_sequences/@@*.gz @@; cd -
# cat ../samples_id.txt | xargs -I @@ mkdir @@; cd @@; ln -s ../../../*/01-processing/fastp/@@_1.fastp.fastq.gz ./@@_R1_filtered.fastq.gz; ln -s ../../../*/01-processing/fastp/@@_2.fastp.fastq.gz ./@@_R2_filtered.fastq.gz ; cd -
# else:
mkdir logs
scratch_dir=$(echo $(pwd) | sed 's@/data/bi/scratch_tmp/@/scratch/@g')
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
# conda activate ariba
# ARIBA runs local assembli/processing_Data/bioinformatics/services_and_colaborations/CNM/bacteriologia/20190821_QCASSEMBLT_s.gonzalez_T/RAW/fastqc_2/.

mkdir logs
scratch_dir=$(echo $PWD | sed 's/\/data\/bi\/scratch_tmp/\/scratch/g')
downloaded_ref=$(find ../../../../REFERENCES/ -type d -name 'ref_db')

# Cartesian product of the two files to avoid double looping
join -j 2 ../../samples_id.txt ../databases.txt | sed 's/^ //g' > sample_database.txt

# col 1 (arr[0]): sample
# col 2 (arr[1]): database
cat sample_database.txt | while read in; do arr=($in); echo "mkdir -p ${arr[0]}; srun --chdir $scratch_dir --output logs/ARIBA${arr[0]}_${arr[1]}.%j.log --job-name ARIBA_${arr[0]}_${arr[1]} --cpus-per-task 5 --mem 5G --partition short_idx --time 02:00:00 ariba run /data/bi/references/ariba/20211216/${arr[1]}/out.${arr[1]}.prepareref ../../../*ASSEMBLY/01-preprocessing/trimmed_sequences/${arr[0]}_1.trim.fastq.gz ../../../*ASSEMBLY/01-preprocessing/trimmed_sequences/${arr[0]}_2.trim.fastq.gz ${arr[0]}/out_${arr[1]}_${arr[0]}_run &"; done > _01_ariba.sh
cat sample_database.txt | while read in; do arr=($in); echo "mv ${arr[0]}/out_${arr[1]}_${arr[0]}_run/report.tsv ${arr[0]}/out_${arr[1]}_${arr[0]}_run/${arr[0]}_${arr[1]}_report.tsv"; done > _02_fix_tsvreport.sh
cat sample_database.txt | grep -v 'pubmlst' | while read in; do arr=($in); echo "mkdir -p ${arr[0]}; srun --chdir $scratch_dir --output logs/ARIBA_${arr[0]}_${arr[1]}.%j.log --job-name ARIBA_${arr[0]}_${arr[1]} --cpus-per-task 5 --mem 5G --partition short_idx --time 02:00:00 ariba run /data/bi/references/ariba/20211216/${arr[1]}/out.${arr[1]}.prepareref ../../01-preprocessing/${arr[0]}/${arr[0]}_R1_filtered.fastq.gz ../../01-preprocessing/${arr[0]}/${arr[0]}_R2_filtered.fastq.gz ${arr[0]}/out_${arr[1]}_${arr[0]}_run &"; done > _01_ariba.sh

cat ../samples_id.txt | while read in; echo "mkdir -p ${arr[0]}; srun --chdir $scratch_dir --output logs/ARIBA_${in}_pubmlst.%j.log --job-name ARIBA_${in}_pubmlst --cpus-per-task 5 --mem 5G --partition short_idx --time 02:00:00 ariba run ${downloaded_ref} ../../01-preprocessing/${in}/${in}_R1_filtered.fastq.gz ../../01-preprocessing/${in}/${in}_R2_filtered.fastq.gz ${in}/out_pubmlst_${in}_run &"; done > _01_ariba.sh

cat sample_database.txt | while read in; do arr=($in); echo "mv ${arr[0]}/out_${arr[1]}_${arr[0]}_run/report.tsv ${arr[0]}/out_${arr[1]}_${arr[0]}_run/${arr[0]}_${arr[1]}_report.tsv"; done > _02_fix_tsvreport.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ scratch_dir=$(echo $PWD | sed 's/\/data\/bi\/scratch_tmp/\/scratch/g')
# 1 - Use the ls in parenthesis to find the reports for a certain db, and xargs to make it into a single line
# 2 - Integrate this into the ariba summary command

cat ../databases.txt | while read in; do echo "srun --chdir $scratch_dir --output logs/ARIBA_SUMMARY_${in}.log --job-name ARIBA_${in} --cpus-per-task 5 --mem 5G --partition short_idx --time 00:30:00 ariba summary --cluster_cols ref_seq,match out_summary_${in} $(ls ../run/*/out*_${in}*/*${in}*_report.tsv | xargs)"; done > _01_ariba_summary_prueba.sh
cat ../databases.txt | while read in; do echo "srun --chdir $scratch_dir --output logs/ARIBA_SUMMARY_${in}.log --job-name ARIBA_${in} --cpus-per-task 5 --mem 5G --partition short_idx --time 00:30:00 ariba summary --cluster_cols ref_seq,match out_summary_${in} $(ls ../run/*/out*_${in}*/*${in}*_report.tsv | xargs) &"; done > _01_ariba_summary_prueba.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@
python3 /data/bi/pipelines/bacterial_qc/parse_ariba.py --path ../02-ariba/summary/out_summary_card.csv --database card --output_bn ariba_card.bn --output_csv ariba_card.csv
python3 /data/bi/pipelines/bacterial_qc/parse_ariba.py --path ../02-ariba/summary/out_summary_plasmidfinder.csv --database plasmidfinder --output_bn ariba_plasmidfinder.bn --output_csv ariba_plasmidfinder.csv
python3 /data/bi/pipelines/bacterial_qc/parse_ariba.py --path ../02-ariba/summary/out_summary_vfdb_full.csv --database vfdb_full --output_bn ariba_vfdb_full.bn --output_csv ariba_vfdb_full.csv

paste <(echo "sample_id") <(cat ../02-ariba/run/*/out_pubmlst_*_run/mlst_report.tsv | head -n1) > ariba_mlst_full.tsv; cat ../samples_id.txt | while read in; do paste <(echo ${in}) <(tail -n1 ../02-ariba/run/${in}/out_pubmlst_${in}_run/mlst_report.tsv); done >> ariba_mlst_full.tsv
Original file line number Diff line number Diff line change
@@ -1,6 +1,2 @@
ln -s ../samples_id.txt .
ln -s ../00-reads .
mkdir 01-preprocessing
mkdir 02-srst2
mkdir 03-ariba
mkdir 99-stats
Loading
Loading