一个R函数完成所有的富集分析

科技 07-31 来源： ZhangYiHui

仅仅一个函数，就可以完成差异表达基因的GO，GSEA，KEGG 分析与可视化，下面来看一看

01 需要的R包

library(AnnotationDbi)library(org.Hs.eg.db)  基因注释需要library(clusterProfiler)library(stringr)library(enrichplot)#画图需要library(stats)library(dplyr)library(msigdbr)

02 定义R函数

首先定义一个R函数，目的是：读入数据；进行富集分析。

#定义R函数allEnrich <- function(df) {df = read.csv(df, stringsAsFactors = F,                header = T)names(df) = c("gene", "avg_logFC")#msigdbr 包提供多个物种的MSigDB数据b = msigdbr(species = "Homo sapiens", category = "C2") %>%      select(gs_name, entrez_gene)a1 = df$avg_logFCa2 = easyConvert(species = "HUMAN",queryList = df$gene,queryType = "SYMBOL"    ) names(a1) = a2$ENTREZIDb1 = a1[order(a1, decreasing = T)]GSEA分析gsearesults = GSEA(b1, TERM2GENE = b)#进行CC分析ego1 <- enrichGO(gene         = df$gene,OrgDb         = org.Hs.eg.db,keyType       = 'SYMBOL',ont           = "CC",pAdjustMethod = "BH",pvalueCutoff  = 0.01,qvalueCutoff  = 0.05)#绘图p1 = dotplot(ego1, showCategory = 15) ggplot2::ggsave("CC.pdf", p1, width = 7, height = 8)#输出结果，保存到本地write.csv(ego1@result, "Cell_component.csv")=========================================#进行BP分析ego2 <- enrichGO(gene         = df$gene,OrgDb         = org.Hs.eg.db,keyType       = 'SYMBOL',ont           = "BP",pAdjustMethod = "BH",pvalueCutoff  = 0.01,qvalueCutoff  = 0.05)#绘图p2 = dotplot(ego2, showCategory = 15) ggplot2::ggsave("BP.pdf", p2, width = 7, height = 8)#输出结果，保存到本地write.csv(ego2@result, "Biological_process.csv")=======================================#进行MF分析ego3 <- enrichGO(gene         = df$gene,OrgDb         = org.Hs.eg.db,keyType       = 'SYMBOL',ont           = "MF",pAdjustMethod = "BH",pvalueCutoff  = 0.01,qvalueCutoff  = 0.05)#绘图p3 = dotplot(ego3, showCategory = 15) ggplot2::ggsave("MF.pdf", p3, width = 7, height = 8)#输出结果，保存到本地write.csv(ego3@result, "Molecular_function.csv")=========================================GSEA可视化p4 = gsearesults@result %>%mutate(Description = gsub("_", " ", Description)) %>%mutate(Description = reorder(Description, setSize)) %>% head(15) %>%ggplot(aes(x = enrichmentScore,y = Description,size = setSize,color = pvalue)) +geom_point() +scale_size(range = c(1, 5), name = "Size") +scale_color_gradient(low = "blue", high = "red") +xlab("GeneRatio") + ylab("") +theme(axis.text.y = element_text(size = 10, color = "black")) ggplot2::ggsave("GSEA.pdf", p4, width = 7, height = 8)write.csv(gsearesults@result, "GSEA_Results.csv")==================================KEGG分析ego4 = enrichKEGG(gene = names(a1),organism = "hsa",keyType = "kegg",pvalueCutoff = 0.05,pAdjustMethod = "BH",minGSSize = 10,maxGSSize = 500,qvalueCutoff = 0.2,use_internal_data = FALSE)p5 = dotplot(ego4, showCategory = 15) ggplot2::ggsave("KEGG.pdf", p5, width = 7, height = 8)write.csv(ego4@result, "KEGG_pathway.csv")    cat("Successfully~") }

03 读入数据分析

使用定义好的R函数：allEnrich，读入数据进行分析，数据格式必须为2列：

第一列 geneSymbol

第二类 Log2FoldChange

#给出包含geneSymbol，Log2FoldChange所在的csv文件 allEnrich("significant_degs.csv") preparing geneSet collections...GSEA analysis...leading edge analysis...done...wrong orderBy parameter; set to default `orderBy = "x"`Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.wrong orderBy parameter; set to default `orderBy = "x"`Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.wrong orderBy parameter; set to default `orderBy = "x"`Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.Reading KEGG annotation online:Reading KEGG annotation online:wrong orderBy parameter; set to default `orderBy = "x"`Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.Successfully~

成功完成分析后，这是富集分析输出的所有结果：