Visualizing gene set enrichment analysis results (clusterProfiler) using dotHeatmap

[1]:
import os,sys
import matplotlib.pylab as plt
import pickle
import glob
import numpy as np
import matplotlib as mpl
import pandas as pd
mpl.style.use('default')
mpl.rcParams['pdf.fonttype'] = 42
mpl.rcParams['ps.fonttype'] = 42
mpl.rcParams['font.family'] = 'sans-serif'
mpl.rcParams['font.sans-serif'] = 'Arial'
mpl.rcParams['figure.dpi'] = 80
mpl.rcParams['savefig.dpi']=300
import pickle
# sys.path.append(os.path.expanduser("~/Projects/Github/PyComplexHeatmap"))
import PyComplexHeatmap
# print(PyComplexHeatmap.__version__)
from PyComplexHeatmap import *

Why do we need a better way to visualize the gene set enrichment analysis result?

Traditionally, we have to plot lots of barplot or dotplot to visualize the gene set enrichment analysis. For example:

2fda46a1639940c89c6aba288175f76c 186533c647d540d1bb8efdec4824bd65

What if we had lots of cell types and performed gene set enrichment analysis for each cell types? It is not a good idea to plot lots of barplot or dotplot in the manuscript.

In this case, we can merge all the enriched result into one matrix and plot a dotHeatmap using PyComplexHeatmap

Read data

In this example dataset, we have 2 groups and 3 different cell types in each group, and we have already gotten the GO Biological Process (BP), Molecular Function (MF) and KEGG Pathway enrichment result using R package clusterProfiler and merged the results into one matrix.

[2]:
with open("../data/enrichment_analysis_dotHeatmap.pickle",'rb') as f:
    data1,df_row,df_col,colors=pickle.load(f)
[3]:
data1
[3]:
ID Description GeneRatio p.adjust Count Group CellType EnrichedCategory -log10(p.adjust) ColID RowID
0 GO:0003779 actin binding 0.044949 2.453212e-41 186 Group2 CT1 MF 40.610265 Group2-CT1 MF||actin binding
1 GO:0030695 GTPase regulator activity 0.043016 6.744023e-40 178 Group2 CT1 MF 39.171081 Group2-CT1 MF||GTPase regulator activity
2 GO:0060589 nucleoside-triphosphatase regulator activity 0.043016 6.744023e-40 178 Group2 CT1 MF 39.171081 Group2-CT1 MF||nucleoside-triphosphatase regulator activity
3 GO:0005201 extracellular matrix structural constituent 0.021266 5.810239e-35 88 Group2 CT1 MF 34.235806 Group2-CT1 MF||extracellular matrix structural constituent
4 GO:0050839 cell adhesion molecule binding 0.031899 1.612420e-31 132 Group2 CT1 MF 30.792522 Group2-CT1 MF||cell adhesion molecule binding
... ... ... ... ... ... ... ... ... ... ... ...
840 GO:0050905 neuromuscular process 0.010454 1.224335e-07 122 Group2 CT2 BP 6.912100 Group2-CT2 BP||neuromuscular process
1205 GO:0030534 adult behavior 0.009426 6.028861e-06 110 Group2 CT2 BP 5.219765 Group2-CT2 BP||adult behavior
1558 GO:0035019 somatic stem cell population maintenance 0.004799 6.029991e-05 56 Group2 CT2 BP 4.219683 Group2-CT2 BP||somatic stem cell population maintenance
1746 GO:0019827 stem cell population maintenance 0.009854 1.467128e-04 115 Group2 CT2 BP 3.833532 Group2-CT2 BP||stem cell population maintenance
2999 GO:0021953 central nervous system neuron differentiation 0.008997 9.138960e-03 105 Group2 CT2 BP 2.039103 Group2-CT2 BP||central nervous system neuron differentiation

139 rows × 11 columns

[4]:
df_row
[4]:
EnrichedCategory Description
RowID
MF||actin binding MF actin binding
MF||GTPase regulator activity MF GTPase regulator activity
MF||nucleoside-triphosphatase regulator activity MF nucleoside-triphosphatase regulator activity
MF||extracellular matrix structural constituent MF extracellular matrix structural constituent
MF||cell adhesion molecule binding MF cell adhesion molecule binding
MF||metal ion transmembrane transporter activity MF metal ion transmembrane transporter activity
MF||monoatomic ion channel activity MF monoatomic ion channel activity
MF||monoatomic ion gated channel activity MF monoatomic ion gated channel activity
MF||gated channel activity MF gated channel activity
MF||calmodulin binding MF calmodulin binding
MF||transmembrane receptor protein kinase activity MF transmembrane receptor protein kinase activity
BP||cell-substrate adhesion BP cell-substrate adhesion
BP||extracellular structure organization BP extracellular structure organization
BP||extracellular matrix organization BP extracellular matrix organization
BP||external encapsulating structure organization BP external encapsulating structure organization
BP||axon guidance BP axon guidance
BP||locomotory behavior BP locomotory behavior
BP||adult behavior BP adult behavior
BP||fat cell differentiation BP fat cell differentiation
BP||neuromuscular process BP neuromuscular process
BP||central nervous system neuron differentiation BP central nervous system neuron differentiation
BP||central nervous system neuron development BP central nervous system neuron development
BP||stem cell population maintenance BP stem cell population maintenance
BP||somatic stem cell population maintenance BP somatic stem cell population maintenance
KEGG||Amphetamine addiction KEGG Amphetamine addiction
KEGG||Calcium signaling pathway KEGG Calcium signaling pathway
KEGG||Cocaine addiction KEGG Cocaine addiction
MF||cytokine activity MF cytokine activity
KEGG||Cytokine-cytokine receptor interaction KEGG Cytokine-cytokine receptor interaction
KEGG||Neuroactive ligand-receptor interaction KEGG Neuroactive ligand-receptor interaction
KEGG||Cytoskeleton in muscle cells KEGG Cytoskeleton in muscle cells
KEGG||ECM-receptor interaction KEGG ECM-receptor interaction
KEGG||Viral protein interaction with cytokine and cytokine receptor KEGG Viral protein interaction with cytokine and cy...
KEGG||Focal adhesion KEGG Focal adhesion
KEGG||Axon guidance KEGG Axon guidance
KEGG||Systemic lupus erythematosus KEGG Systemic lupus erythematosus
KEGG||ATP-dependent chromatin remodeling KEGG ATP-dependent chromatin remodeling
MF||methylation-dependent protein binding MF methylation-dependent protein binding
MF||methylated histone binding MF methylated histone binding
MF||modification-dependent protein binding MF modification-dependent protein binding
[5]:
df_col
[5]:
Group CellType
Group2-CT1 Group2 CT1
Group1-CT2 Group1 CT2
Group1-CT3 Group1 CT3
Group2-CT3 Group2 CT3
Group1-CT4 Group1 CT4
Group1-CT1 Group1 CT1
Group2-CT2 Group2 CT2

Plot

[6]:
s_min,s_max=data1.GeneRatio.min(),data1.GeneRatio.max()
[7]:
data1.loc[data1.GeneRatio==s_min]
[7]:
ID Description GeneRatio p.adjust Count Group CellType EnrichedCategory -log10(p.adjust) ColID RowID
3074 GO:0021954 central nervous system neuron development 0.004586 0.027429 42 Group2 CT3 BP 1.561784 Group2-CT3 BP||central nervous system neuron development
[8]:
data1.loc[data1.GeneRatio==s_max]
[8]:
ID Description GeneRatio p.adjust Count Group CellType EnrichedCategory -log10(p.adjust) ColID RowID
1 GO:0045444 fat cell differentiation 0.176471 0.000382 6 Group1 CT1 BP 3.417494 Group1-CT1 BP||fat cell differentiation
[9]:
left_ha = HeatmapAnnotation(
                          label=anno_label(df_row.EnrichedCategory, merge=True,rotation=0,colors=colors,relpos=(1,0.8)),
                          Category=anno_simple(df_row.EnrichedCategory,cmap='Set1',
                                           add_text=False,legend=False,colors=colors),
                           axis=0,verbose=0,label_kws={'rotation':45,'horizontalalignment':'left','visible':False})
right_ha = HeatmapAnnotation(
                          label=anno_label(df_row.Description, merge=True,rotation=0,relpos=(0,0.5),arrowprops=dict(visible=True),
                                           colors=df_row.assign(color=df_row.EnrichedCategory.map(colors)).set_index('Description').color.to_dict(),
                                          fontsize=12,luminance=0.8,height=2),
                           axis=0,verbose=0,label_kws={'rotation':45,'horizontalalignment':'left'},
                            orientation='right')

col_ha = HeatmapAnnotation(
                           Group=anno_simple(df_col.Group,cmap='Set1',legend=False,add_text=True,
                                               height=4.5,colors={'Group1':'yellowgreen','Group2':'orange'}),
                           CellClass=anno_simple(df_col.CellType,cmap='Set1',legend=False,add_text=True,
                                                 height=4.5),
                           verbose=0,label_side='right',label_kws={'horizontalalignment':'left'})

plt.figure(figsize=(4, 8))
cm = DotClustermapPlotter(data=data1, x='ColID',y='RowID',value='-log10(p.adjust)',c='-log10(p.adjust)',s='GeneRatio',
                          row_cluster=True,col_cluster=True,hue='Group',
                          cmap={'Group1':'Greens','Group2':'OrRd'},
                          vmin=-1*np.log10(0.1),vmax=-1*np.log10(1e-10),
                          colors={'Group1':'yellowgreen','Group2':'orange'},
                          marker={'Group1':'*','Group2':'$\\ast$'},
                          show_rownames=True,show_colnames=False,row_dendrogram=False,
                          col_names_side='top',#row_names_side='right',
                          xticklabels_kws={'labelrotation': 30, 'labelcolor': 'blue','labelsize':14},
                          top_annotation=col_ha,left_annotation=left_ha,right_annotation=right_ha,
                          spines=False,
                          row_split=df_row.EnrichedCategory, row_split_gap=1,
                          col_split=df_col.Group,col_split_gap=0.5,
                          verbose=1,legend_gap=8,
                          #dot_legend_marker='*',
                          # xlabel=Group,xlabel_side="top",
                          # xlabel_kws=dict(labelpad=8,fontweight='bold'),
                          # xlabel_bbox_kws=dict(facecolor=facecolor)
                         )
# cm.ax_heatmap.grid(which='minor',color='black',linestyle='--',linewidth=1)
# for ax in cm.heatmap_axes.ravel():
#     ax.grid(axis='both',which='minor',color='black',linestyle='dashdot',alpha=0.1) # which can also be set to major
    # for side in ["top", "left", "bottom"]: # to show the spines
    #     ax.spines[side].set_visible(True)
    #     ax.spines[side].set_color('grey')
    #     ax.spines[side].set_linewidth(1)
    #     ax.spines[side].set_linestyle('--')
# plt.savefig(f"cell_class_dotHeatmap.pdf",bbox_inches='tight')
plt.show()
Starting plotting..
Starting calculating row orders..
Reordering rows..
Starting calculating col orders..
Reordering cols..
Plotting matrix..
Inferred max_s (max size of scatter point) is: 106.46754159267321
Collecting legends..
Plotting legends..
Estimated legend width: 23.6375 mm
../_images/notebooks_gene_enrichment_analysis_16_1.png

In this example, we used two different kinds of markers and cmap to indicate Group1 and Group2, rows are split by category of enriched terms, sizes of dot/marker indicates gene ratio (The ratio of genes annotated by the GO term in your list to the total number of genes in your list).

How to select marker:

6418c362488342bd89d4a9b4f23480a8