Ensembl Annotation for Gene Expression Data

gene_x 0 like s 1210 view s

Tags: R, human, genomics

The code provided is designed to annotate a list of genes by fetching associated gene information from the Ensembl database. Starting with the biomaRt library, it connects to the Ensembl dataset specifically tailored for human genes. Then, gene expression data is imported from a CSV file named "DEGs_heatmap_data.csv". The primary loop in the script iterates over each gene, retrieves detailed attributes such as gene name, gene biotype, chromosome position, and more, and then appends this to the initial data. To manage instances where the database returns multiple entries for a single gene, or none at all, there are conditions to handle such cases by either selecting the first entry or populating with 'NA' values respectively. As the loop progresses, for every 100 records, a progress message is printed to give users an indication of how much work has been completed. Once all genes are annotated, the enriched dataset is saved to a new CSV file named "annotated_output_file.csv".

# Load the necessary library
library(biomaRt)

# Connect to the Ensembl dataset
ensembl <- useEnsembl(biomart = "ensembl", dataset = "hsapiens_gene_ensembl", version="104")

# Read the data from your CSV file
data <- read.csv("DEGs_heatmap_data.csv", header=TRUE, row.names=1)

# Initialize an empty data frame for the annotated data
annotated_data <- data.frame()

# Determine total number of genes
total_genes <- length(rownames(data))

# Loop through each gene to annotate
for (i in 1:total_genes) {
    gene <- rownames(data)[i]
    result <- getBM(attributes = c('ensembl_gene_id', 'external_gene_name', 'gene_biotype', 'entrezgene_id', 'chromosome_name', 'start_position', 'end_position', 'strand', 'description'),
                    filters = 'ensembl_gene_id',
                    values = gene,
                    mart = ensembl)

    # If multiple rows are returned, take the first one
    if (nrow(result) > 1) {
        result <- result[1, ]
    }

    # Check if the result is empty
    if (nrow(result) == 0) {
        result <- data.frame(ensembl_gene_id = gene,
                             external_gene_name = NA,
                             gene_biotype = NA,
                             entrezgene_id = NA,
                             chromosome_name = NA,
                             start_position = NA,
                             end_position = NA,
                             strand = NA,
                             description = NA)
    }

    # Transpose expression values
    expression_values <- t(data.frame(t(data[gene, ])))
    colnames(expression_values) <- colnames(data)

    # Combine gene information and expression data
    combined_result <- cbind(result, expression_values)

    # Append to the final dataframe
    annotated_data <- rbind(annotated_data, combined_result)

    # Print progress every 100 genes
    if (i %% 100 == 0) {
        cat(sprintf("Processed gene %d out of %d\n", i, total_genes))
    }
}

# Save the annotated data to a new CSV file
write.csv(annotated_data, "annotated_output_file.csv", row.names=FALSE)

like unlike

点赞本文的读者

还没有人对此文章表态


本文有评论

没有评论

看文章,发评论,不要沉默

最受欢迎文章

  1. Motif Discovery in Biological Sequences: A Comparison of MEME and HOMER
  2. Why Do Significant Gene Lists Change After Adding Additional Conditions in Differential Gene Expression Analysis?
  3. Calling peaks using findPeaks of HOMER
  4. PiCRUST2 Pipeline for Functional Prediction and Pathway Analysis in Metagenomics
  5. Should the inputs for GSVA be normalized or raw?
  6. Updating Human Gene Identifiers using Ensembl BioMart: A Step-by-Step Guide
  7. Kraken2 Installation and Usage Guide
  8. pheatmap vs heatmap.2
  9. Setup conda environments
  10. Guide to Submitting Data to GEO (Gene Expression Omnibus)

最新文章


最多评论文章


推荐相似文章

Display viral transcripts found in mRNA-seq MKL-1, WaGa EVs compared to cells

Human RNA-seq processing for Data_Ben_RNAseq_2025

What type of evidence was used to assert linkage across the assembly gaps?

Comprehensive Genomic and Phenotypic Characterization of a Clinically Susceptible Acinetobacter baumannii Isolate Harboring Extensive Resistance Determinants


© 2023 XGenes.com Impressum