Skip to contents

Convenience function for returning information about a gene or a set of genes. This function is internally called by [BioMaesteR::get_gene_region].

Usage

get_gene_info(these_genes = NULL, projection = "hg38", raw = FALSE)

Arguments

these_genes

Required argument. The gene or genes of interest.

projection

The desired projection, default is hg38.

raw

Default is FALSE, set to TRUE for keeping all columns.

Value

A data frame with gene information.

Details

Give the function a gene or a set of genes (as a vector of characters), specify the projection (if not, hg38 is the default projection) and return gene information based on the bundled data. By default this function is run with `raw = FALSE`, this returns a subset of columns. If instead the user wants everything back (i.e all available columns) toggle `raw` to `TRUE`.

Examples

#Example 1 - Query one gene (in Hugo format) and with default parameters.
get_gene_info(these_genes = "MYC")
#>   hugo_symbol ensembl_gene_id type   gene_biotype         source gene_version
#> 1         MYC ENSG00000136997 gene protein_coding ensembl_havana           22
#>      gene_source  tag ccds_id score transcript_id transcript_version
#> 1 ensembl_havana <NA>    <NA>    NA          <NA>               <NA>
#>   transcript_name transcript_source transcript_biotype exon_number exon_id
#> 1            <NA>              <NA>               <NA>        <NA>    <NA>
#>   protein_id protein_version
#> 1       <NA>            <NA>

#Example 2 - Same as example 1 but MYC is here specified as Ensembl ID.
get_gene_info(these_genes = "ENSG00000136997")
#>   ensembl_gene_id hugo_symbol type   gene_biotype         source gene_version
#> 1 ENSG00000136997         MYC gene protein_coding ensembl_havana           22
#>      gene_source  tag ccds_id score transcript_id transcript_version
#> 1 ensembl_havana <NA>    <NA>    NA          <NA>               <NA>
#>   transcript_name transcript_source transcript_biotype exon_number exon_id
#> 1            <NA>              <NA>               <NA>        <NA>    <NA>
#>   protein_id protein_version
#> 1       <NA>            <NA>

#Example 3 - Request multiple genes with non-default parameters
get_gene_info(these_genes = c("MYC", "BCL2"),
              projection = "grch37")
#>   hugo_symbol ensembl_gene_id type   gene_biotype         source gene_version
#> 1         MYC ENSG00000136997 gene protein_coding ensembl_havana           10
#> 2        BCL2 ENSG00000171791 gene protein_coding ensembl_havana           10
#>      gene_source  tag ccds_id score transcript_id transcript_version
#> 1 ensembl_havana <NA>    <NA>    NA          <NA>               <NA>
#> 2 ensembl_havana <NA>    <NA>    NA          <NA>               <NA>
#>   transcript_name transcript_source transcript_biotype exon_number exon_id
#> 1            <NA>              <NA>               <NA>        <NA>    <NA>
#> 2            <NA>              <NA>               <NA>        <NA>    <NA>
#>   protein_id protein_version
#> 1       <NA>            <NA>
#> 2       <NA>            <NA>

#Example 4 - Request multiple Ensembl IDs and return all columns.
get_gene_info(these_genes = c("ENSG00000136997", "ENSG00000171791"),
              raw = TRUE)
#>   chrom     start       end  width strand type  tag ccds_id ensembl_gene_id
#> 1 chr18  63123346  63320128 196783      - gene <NA>    <NA> ENSG00000171791
#> 2  chr8 127735434 127742951   7518      + gene <NA>    <NA> ENSG00000136997
#>   hugo_symbol         source score gene_version    gene_source   gene_biotype
#> 1        BCL2 ensembl_havana    NA           14 ensembl_havana protein_coding
#> 2         MYC ensembl_havana    NA           22 ensembl_havana protein_coding
#>   transcript_id transcript_version transcript_name transcript_source
#> 1          <NA>               <NA>            <NA>              <NA>
#> 2          <NA>               <NA>            <NA>              <NA>
#>   transcript_biotype exon_number exon_id exon_version protein_id
#> 1               <NA>        <NA>    <NA>         <NA>       <NA>
#> 2               <NA>        <NA>    <NA>         <NA>       <NA>
#>   protein_version input_format
#> 1            <NA>      Ensembl
#> 2            <NA>      Ensembl