Skip to contents

Helper function for cleaning and standardize regions.

Usage

purify_regions(
  these_regions = NULL,
  qchrom = NULL,
  qstart = NULL,
  qend = NULL,
  projection = "hg38"
)

Arguments

these_regions

The region(s) to be queried. Can be a data frame with regions with the following columns; chrom, start, end. Or in a string in the following format chr:start-end.

qchrom

Query chromosome (prefixed or un-prefixed), Required if `these_regions` is not provided.

qstart

Query start position. Required if `these_regions` is not provided.

qend

Query end position. Required if `these_regions` is not provided.

projection

The desired projection you want back coordinates for. Available projections are hg38 and grch37. Default is hg38.

Value

A data table with three columns, chrom, start, end.

Details

This function accepts a variety of incoming regions. Either, regions can be provided as a data frame with `these_regions`. If so, the following columns must exist; chrom, start, end. This parameter (`these_regions`) also accept a region in "region" format, (i.e chr:start-end). This can be a region or a vector of characters with multiple regions. The user can also individually specify region(s) with; `qchrom` (string), `qstart` (string, or integer), and `qend` (string or integer). These parameters can also accept a vector of characters for multiple regions. The function also handles chromosome prefixes in the returned object, based on the selected `projection`. In addition, this function also checks if the provided start coordinate is equal or greater to the end coordinate for the same chromosome. It also ensures that specified ranges are within the actual chromosomal range.

Examples

#Example 1 - Give the function one region as a string
purify_regions(these_regions = "chr1:100-500")
#>    chrom start end
#> 1:  chr1   100 500

#Example 2 - Give the function multiple regions as a string
purify_regions(these_regions = c("chr1:100-500", "chr2:100-500"),
               projection = "grch37")
#>    chrom start end
#> 1:     1   100 500
#> 2:     2   100 500

#Example 3 - Individually specify the chromosome, start and end coordinates
purify_regions(qchrom = "chr1",
               qstart = 100,
               qend = 500)
#>    chrom start end
#> 1:  chr1   100 500

#Example 4 - Individually specify multiple regions with the query parameters
purify_regions(qchrom = c("chr1", "chr2"),
               qstart = c(100, 200),
               qend = c(500, 600),
               projection = "grch37")
#>    chrom start end
#> 1:     1   100 500
#> 2:     2   200 600