Running ViromeXplore
===============================

Running the Workflows
----------------------

**Usage**

To run the workflows, use the following commands:

.. code-block:: bash

   nextflow ViromeXplore.nf --pipeline qc_classify --reads "basename_{1,2}.fastq"
   nextflow ViromeXplore.nf --pipeline viral_assembly --reads "basename_{1,2}.fastq"
   nextflow ViromeXplore.nf --pipeline find_viruses --contigs contigs.fasta
   nextflow ViromeXplore.nf --pipeline high_quality_genomes --reads "basename_{1,2}.fastq" --contigs contigs.fasta --viral_contigs viral_contigs.fasta
   nextflow ViromeXplore.nf --pipeline taxonomy_annotation --viral_contigs viral_contigs_or_genomes.fasta
   nextflow ViromeXplore.nf --pipeline host_prediction --phylogeny viral_phylogeny.nwk --taxonomy host_taxonomy.tsv --matrix virus_host_abundances.tsv

Containers are available for all processes. Use the appropriate profile to run the workflows:

- For Docker: ``-profile docker``
- For Singularity (default): ``-profile singularity``

If using a cluster system (e.g., SLURM), you can combine profiles to configure resource usage. To do this, modify the ``config/local.config`` file and run using:

.. code-block:: bash

   -profile singularity,slurm
   -profile docker,slurm

Make sure to include the selected profile when running the workflow.

Mandatory Arguments
-------------------

- ``--pipeline``  
  Specifies the pipeline to run. Valid options:  
  ``qc_classify``, ``viral_assembly``, ``find_viruses``, ``high_quality_genomes``, ``taxonomy_annotation``, ``host_prediction``

**For `qc_classify` and `viral_assembly` pipelines**:

- ``--reads``  
  Input reads in FASTQ format, e.g.:  
  ``basename_{1,2}.fastq``

**For `find_viruses` and `taxonomy_annotation` pipelines**:

- ``--contigs``  
  Contigs file in FASTA format, e.g.:  
  ``contigs.fasta``

**For `high_quality_genomes` pipeline**:

- ``--reads``  
  Input reads in FASTQ format: ``basename_{1,2}.fastq``  
- ``--contigs``  
  Contigs file obtained from assembly: ``contigs.fasta``  
- ``--viral_contigs``  
  Viral contigs or genomes: ``viral_contigs.fasta``

**For `taxonomy_annotation` pipeline**:

- ``--viral_contigs``  
  Viral contigs or genomes: ``viral_contigs_or_genomes.fasta``

**For `host_prediction` pipeline**:

- ``--phylogeny``  
  Phylogenetic tree of the viruses (NEWICK format): ``virus_phylogeny.nwk``  
- ``--taxonomy``  
  Lineage of host taxa (tab-delimited): ``taxonomy_file.tsv``  
- ``--matrix``  
  Virus-host abundance matrix (tab-delimited): ``matrix_abundances.tsv``  
  *Columns represent taxa; rows represent samples.*

Optional Arguments
-------------------

- ``--result_dir``  
  Directory to store output files.  
  *Default: ``results``*

- ``--cpus``  
  Number of CPUs to use.  
  *Default: all available*

- ``--memory``  
  Memory (in GB) to allocate.  
  *Default: 12 GB*

- ``--help``  
  Display help message.

- ``--workdir``  
  Work directory for nextflow.
  *Default: work*

Tool-Specific Parameters
------------------------

**ViromeQC**

- ``samp_type``  
  Sample type.  
  *Default: ``environmental``*

**VirSorter2**

- ``virsorter_minlength``  
  Minimum contig length to keep.  
  *Default: ``1500``*

**Fastp**

- ``phred_quality``  
  Minimum phred quality score for filtering.  
  *Default: ``30``*

**MEGAHIT**

- ``kmers_assembly``  
  K-mer sizes to use for assembly.  
  *Default: ``21,35,49,63,77,91,105,119,127``*

**COBRA**

- ``cobra_assembly``  
  Assembly method used.  
  *Default: ``megahit``*

- ``min_kmer``  
  Minimum k-mer size.  
  *Default: ``21``*

- ``max_kmer``  
  Maximum k-mer size.  
  *Default: ``127``*
Custom Database Arguments
-------------------------

By default, ViromeXplore uses bundled reference databases.  
However, users may specify **custom databases** for the following tools:

- ``--virsorterdb``  
  Custom VirSorter2 database path.

- ``--checkvdb``  
  Custom CheckV database path.

- ``--kaijudb``  
  Custom Kaiju database path.

- ``--virushostdb``  
  Custom virus-host database path.

- ``--genomaddb``  
  Custom geNomad database path.

- ``--eggnogdb``  
  Custom EggNOG database path.

.. note::

   All parameters (including defaults and database locations)  
   are defined in the ``nextflow.config`` file.  
   Users may edit this file directly or override parameters via the command line.


Available Pipelines
-------------------

- **qc_classify**  
  Detects non-viral contamination and classifies reads.  
  *(Requires ILLUMINA FASTQ files)*

- **viral_assembly**  
  Performs QC and assembly of virome reads.  
  *(Requires ILLUMINA FASTQ files)*

- **find_viruses**  
  Identifies and annotates viral sequences.  
  *(Requires FASTA contigs file)*

- **high_quality_genomes**  
  Estimates abundance and improves genome completeness.  
  *(Requires FASTA contigs, viral contigs, and ILLUMINA FASTQ files)*

- **taxonomy_annotation**  
  Assigns taxonomy and gene functions to viral genomes.  
  *(Requires FASTA viral contigs/genomes)*

- **host_prediction**  
  Predicts virus-host interactions using abundance, taxonomy, and phylogeny.  
  *(Requires NEWICK tree, taxonomy file, and abundance matrix)*