dna package

Submodules

dna.dna_averages module

Module containing the HelParAverages class and the command line interface.

class dna.dna_averages.HelParAverages(input_ser_path, output_csv_path, output_jpg_path, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_dna HelParAverages

Load .ser file for a given helical parameter and read each column corresponding to a base calculating average over each one.

Parameters:

input_ser_path (str) – Path to .ser file for helical parameter. File is expected to be a table, with the first column being an index and the rest the helical parameter values for each base/basepair. File type: input. Sample file. Accepted formats: ser (edam:format_2330).
output_csv_path (str) –
Path to .csv file where output is saved. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_jpg_path (str) –
Path to .jpg file where output is saved. File type: output. Sample file. Accepted formats: jpg (edam:format_3579).
properties (dict) –
- sequence (str) - (None) Nucleic acid sequence corresponding to the input .ser file. Length of sequence is expected to be the same as the total number of columns in the .ser file, minus the index column (even if later on a subset of columns is selected with the seqpos option).
- helpar_name (str) - (Optional) helical parameter name.
- stride (int) - (1000) granularity of the number of snapshots for plotting time series.
- seqpos (list) - (None) list of sequence positions (columns indices starting by 0) to analyze. If not specified it will analyse the complete sequence.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.

Examples

This is a use example of how to use the building block from Python:

from biobb_dna.dna.dna_averages import dna_averages

prop = {
    'helpar_name': 'twist',
    'seqpos': [1,2],
    'sequence': 'GCAT'
}
dna_averages(
    input_ser_path='/path/to/twist.ser',
    output_csv_path='/path/to/table/output.csv',
    output_jpg_path='/path/to/table/output.jpg',
    properties=prop)

Info:

wrapped_software:
- name: In house
- license: Apache-2.0
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

launch() → int[source]: Execute the HelParAverages object.

dna.dna_averages.dna_averages(input_ser_path: str, output_csv_path: str, output_jpg_path: str, properties: Optional[dict] = None, **kwargs) → int[source]: Create HelParAverages class and execute the launch() method.

dna.dna_averages.main()[source]: Command line execution of this building block. Please check the command line documentation.

dna.dna_timeseries module

Module containing the HelParTimeSeries class and the command line interface.

class dna.dna_timeseries.HelParTimeSeries(input_ser_path, output_zip_path, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_dna HelParTimeSeries

Created time series and histogram plots for each base pair from a helical parameter series file.

Parameters:

input_ser_path (str) –
Path to .ser file for helical parameter. File is expected to be a table, with the first column being an index and the rest the helical parameter values for each base/basepair. File type: input. Sample file. Accepted formats: ser (edam:format_2330).
output_zip_path (str) –
Path to output .zip files where data is saved. File type: output. Sample file. Accepted formats: zip (edam:format_3987).
properties (dict) –
- sequence (str) - (None) Nucleic acid sequence corresponding to the input .ser file. Length of sequence is expected to be the same as the total number of columns in the .ser file, minus the index column (even if later on a subset of columns is selected with the usecols option).
- bins (int) - (None) Bins for histogram. Parameter has same options as matplotlib.pyplot.hist.
- helpar_name (str) - (Optional) helical parameter name.
- stride (int) - (1000) granularity of the number of snapshots for plotting time series.
- seqpos (list) - (None) list of sequence positions (columns indices starting by 0) to analyze. If not specified it will analyse the complete sequence.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.

Examples

This is a use example of how to use the building block from Python:

from biobb_dna.dna.dna_timeseries import dna_timeseries

prop = {
    'helpar_name': 'twist',
    'seqpos': [1,2,3,4,5],
    'sequence': 'GCAACGTGCTATGGAAGC',
}
dna_timeseries(
    input_ser_path='/path/to/twist.ser',
    output_zip_path='/path/to/output/file.zip'
    properties=prop)

Info:

wrapped_software:
- name: In house
- license: Apache-2.0
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

launch() → int[source]: Execute the HelParTimeSeries object.

dna.dna_timeseries.dna_timeseries(input_ser_path: str, output_zip_path: str, properties: Optional[dict] = None, **kwargs) → int[source]: Create HelParTimeSeries class and execute the launch() method.

dna.dna_timeseries.main()[source]: Command line execution of this building block. Please check the command line documentation.

dna.dna_bimodality module

Module containing the HelParBimodality class and the command line interface.

class dna.dna_bimodality.HelParBimodality(input_csv_file, output_csv_path, output_jpg_path, input_zip_file=None, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_dna HelParBimodality

Determine binormality/bimodality from a helical parameter series dataset.

Parameters:

input_csv_file (str) –
Path to .csv file with helical parameter series. If input_zip_file is passed, this should be just the filename of the .csv file inside .zip. File type: input. Sample file. Accepted formats: csv (edam:format_3752)
input_zip_file (str) (Optional) – .zip file containing the input_csv_file .csv file. File type: input. Accepted formats: zip (edam:format_3987).
output_csv_path (str) –
Path to .csv file where output is saved. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_jpg_path (str) –
Path to .jpg file where output is saved. File type: output. Sample file. Accepted formats: jpg (edam:format_3579).
properties (dict) –
- helpar_name (str) - (Optional) helical parameter name.
- confidence_level (float) - (5.0) Confidence level for Byes Factor test (in percentage).
- max_iter (int) - (400) Number of maximum iterations for EM algorithm.
- tol (float) - (1e-5) Tolerance value for EM algorithm.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.1

Examples

This is a use example of how to use the building block from Python:

from biobb_dna.dna.dna_bimodality import dna_bimodality

prop = {
    'max_iter': 500,
}
dna_bimodality(
    input_csv_file='filename.csv',
    input_zip_file='/path/to/input.zip',
    output_csv_path='/path/to/output.csv',
    output_jpg_path='/path/to/output.jpg',
    properties=prop)

Info:

wrapped_software:
- name: In house
- license: Apache-2.0
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

bayes_factor_criteria(bic1, bic2)[source]

fit_to_model(data)[source]: Fit data to Gaussian Mixture models. Return dictionary with distribution data.

helguero_theorem(mean1, mean2, var1, var2)[source]

launch() → int[source]: Execute the HelParBimodality object.

dna.dna_bimodality.dna_bimodality(input_csv_file, output_csv_path, output_jpg_path, input_zip_file: Optional[str] = None, properties: Optional[dict] = None, **kwargs) → int[source]: Create HelParBimodality class and execute the launch() method.

dna.dna_bimodality.main()[source]: Command line execution of this building block. Please check the command line documentation.