Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

test.co.recstat

Tests if regions located between Stop codons contain putative CDSs.


Description

This test uses columns (codons) factor scores computed by recstat in order to determine if the regions located between two Stop codons correspond to putative CDSs.

Usage

test.co.recstat(rec, fac = 1, length.min = 150, stop.max = 0.2, win.lim = 0.8,
    direct = TRUE, level = 0.01)

Arguments

rec

list of elements returned by recstat function.

fac

axis of the CA to use for test (4 fac 1).

length.min

minimal length between two Stop codons.

stop.max

threshold for Stop codons relative position in a window to determine if this window can be used for test computation.

win.lim

minimum proportion of windows inside a region showing a p-value below the threshold for Kruskal-Wallis test.

direct

a logical for the choice of direct or reverse strand.

level

p-value threshold for Kruskal-Wallis test.

Details

The test is computed for all windows located between two Stop codons separated by at least length.min nucleotides. For each window inside a region considered, a Kruskal-Wallis test is computed on the factor scores of the codons found in this window, this for the three possible reading frames. If a proportion of at least win.lim windows in the region reject the null hypothesis of means equality between the reading frames, then, there is a good probability that a CDS is located in the region.

Inside the first and the last windows of a region submitted to the test, the relative position of the two Stop codons is used to determine if those windows can be used in the analysis. If the first Stop is located within the stop.max fraction of the 5' end of the window, then this window is kept in the analysis. In the same way, if the second Stop is located within the stop.max fraction of the 3' end of the window, this window is also kept in the analysis.

Value

The result is returned as a list containing three matrices (one for each reading frame). All matrices have the same structure, with rows corresponding to the regions between two Stop codons. Columns Start and End give the location of starting and ending positions of the region; and CDS is a binary indicator equal to 1 if a putative CDS is predicted, and to 0 if not.

Author(s)

O. Clerc, G. Perrière

See Also

Examples

## Not run: # CPU time  is too long with windows
ff <- system.file("sequences/ECOUNC.fsa", package = "seqinr")
seq <- read.fasta(ff)
rec <- recstat(seq[[1]], seqname = getName(seq))
test.co.recstat(rec)

## End(Not run)

seqinr

Biological Sequences Retrieval and Analysis

v4.2-16
GPL (>= 2)
Authors
Delphine Charif [aut], Olivier Clerc [ctb], Carolin Frank [ctb], Jean R. Lobry [aut, cph], Anamaria Necşulea [ctb], Leonor Palmeira [ctb], Simon Penel [cre], Guy Perrière [ctb]
Initial release
2022-05-19

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.