MiST4.0: Microbial Signal Transduction Database

Examples: Vibrionales; Escherichia coli; GCF_001315015.1

Genomes

Metagenomes

0 genes

0 genomes

Help

Overview

The Microbial Signal Transduction (MiST) database provides a comprehensive classification of signal transduction systems of bacterial and archaeal genomes stored in the NCBI RefSeq database. MiST 3.0 was a result of substantial scaling to accommodate constantly growing microbial genomic data (1). The database was built from ground up using modern technologies.

The new release of the database, MiST 4.0, features over 10,000 metagenome-assembled genomes and their signal transduction profiles, scaled representation of proteins, detailed BioSample information, and an updated bacterial and archaeal taxonomy. MiST 4.0 provides a seamless integration of Genomes and Metagenomes within a single website, thus allowing for comparative analyses.

Taxonomy Update

Starting from MiST 4.0, the database incorporates updates to the NCBI taxonomy reflecting the inclusion of the rank "phylum" in the International Code of Nomenclature for Prokaryotes, which involved changes to 42 taxa (Link to the paper).

In addition, the database includes changes within certain phyla in line with the NCBI Taxonomy.

Signal transduction classification schemes

The current version of MiST utilizes two classification schemes: to categorize signal transduction proteins (provided as “Genomic distribution of signal transduction proteins” tables) and to catalog signal transduction domains (provided as a “Signal transduction profile” graph).

Classification of signal transduction proteins

For signal transduction proteins, we use the “complexity” scheme – one-component, two-component, and chemosensory systems. In addition to classifying pathway components, this scheme allows separating intracellular and extracellular signal transduction pathways: more than 97% of one-component systems are intracellular sensors and the vast majority of two-component systems contain extracellular sensors. The disadvantage of this scheme is that it does not provide a clear separation by protein function for some categories and, in contrast, splits some of the functionally related proteins into different categories. For example, the majority of transcription factors and serine/threonine kinases would be found in the same category of one-component systems; on the other hand, c-di-GMP-cyclases will be split between the three main categories (one-component, two-component or chemosensory) depending on their associated domains and pathways.

An alternative scheme involves classification by protein function, e.g. placing chemoreceptors, histidine kinases, c-di-GMP-cyclases and phosphodiesterases, ser/thr kinases and other key signal transduction proteins in separate categories. While this scheme might seem more attractive, because it emphasizes the functional role of a protein, it has its own shortcomings. For example, the same category of response regulators contains such functionally unrelated proteins as transcription factors and c-di-GMP-cyclases, as long as they are associated with the receiver domain. In the case of chemosensory pathways, the current functional classification scheme splits their components between several categories providing no connection between the elements of the same pathway.

Our protein classification scheme does provide functional categorization for several major signal transduction families: chemoreceptors, histidine kinases, response regulators, and extracytoplasmic sigma factors. In the future, we plan to implement a classification scheme based exclusively on the protein function, in addition to our current scheme, so the users can chose which option to use based on the nature of their inquiry.

Signal transduction profiles generation

Signal transduction pathways contain various protein domains: many of them are unique to signal transduction, whereas others can play roles in other processes. Here again, there is no simple and unambiguous way for their classification. We present a summary of signal transduction domains for each genome as a graph titled “Signal transduction profile”. We classified these domains in seven major categories:

Input (sensory)
- Cofactor-binding (e.g BLUF domains)
- Enzymatic (enzyme-like ligand-binding domains)
- Protein-protein interactions (e.g. TPR domains)
- Signaling
- Small-ligand binding (e.g. Cache domains)
- Unknown (any input domain, whose role in signal transduction is not understood, but it is found in association with a known signal transduction domain)
Output (regulatory)
- DNA binding (the majority of transcription factors)
- RNA binding (e.g. ANTAR domain)
- Enzymatic (EAL, GGDEF, Guanylate_cyc domains)
- Protein-protein interactions (e.g. a stand-alone receiver domain)
Chemotaxis (domains specific to chemosensory pathways)
Transmitter (transmit information from Input)
Receiver (receive information from Transmitter)
ECF
Unknown (any domain, whose role in signal transduction is not understood, but it is found in association with a known signal transduction domain)

To classify the extracytoplasmic function (ECF) sigma factor proteins we used the profile HMMs described in Staron et al.

Because this scheme classifies domains, not proteins, the same protein would appear in various categories. For example, if a protein has a domain, whose role in signal transduction is unknown, and a well-annotated signal transduction domain (e.g. GGDEF), it will be listed both in the Unknown subcategory of the Input category and in the Enzymatic subcategory of the Output category. Systematic exploring of the Unknown subcategories might lead to the discovery of novel signal transduction domains and understanding the roles of other domains in signal transduction. Below is a list of signal transduction domains and their annotations used for domains classificatoin.

Signal Transduction specific profile HMMs

Genomic evidence suggested that MCPs of certain heptad classes interact preferentially with certain chemosensory pathway classes defined based on evolutionary considerations (2). Specific profile Hidden Markov models (HMMs) were built for nineteen classes of chemosensory pathways (2) and twelve classes of MCPs (3, 4). Furthermore, a new class of signal transduction proteins called MAC (methyl-accepting coiled-coil proteins) was identified (2), for which no profile HMM is available.

We integrated profile HMMs for different MCP classes, namely 64H, 58H, 52H, 48H, 44H, 42H, 40H, 38H, 36H, 34H, 28H and 24H, into the MiST database. We also integrated HMM profiles for components specific to each chemosensory class (CheA, CheB, CheC, CheD, CheR, CheV and CheZ) (2) and newly built profiles for MAC1 and MAC2 protein families. Thus, the MiST database offers a comprehensive set of chemosensory pathway-specific HMMs. Using these new profiles in combination with genome neighborhood analysis a complete chemosensory repertoire of any bacterial and archaeal genome can now be reconstructed.

The profiles can be found here.

Genomes

To search for genomes select ‘Genomes’ in the search area and type an organism name, any taxonomy level (genus, family, etc.), RefSeq accession and version, NCBI taxonomy ID or genome assembly level. Genomes can be filtered by taxonomy and assembly level using either embedded filter or selecting corresponding taxonomic name in the drop-down list on the search results table.

On the genome detail page a comprehensive information about the selected genome is provided including its Bioproject identifier, submitter, and complete description of the signal transduction systems. The signal transduction profile of a genome is presented as a graph of functional domains together with their counts and a table showing distribution of signal transduction proteins across OCS, TCS and chemosensory systems. The chemosensory systems table shows the chemosensory pathways encoded in the given genome. Clicking on the graph bars and on the gene counts in the table leads to the list of corresponding signal transduction proteins.

Genes

To search for genes or proteins select ‘Genes/Proteins’ in the search area and type a gene product name, genome locus (using both old and new tags), protein RefSeq identifier, or using our unique internal stable identifier, which includes the genome RefSeq ID and gene locus. A gene/protein detail page contains information about the selected gene, its encoded product, protein domain architecture including details of predicted protein features, and a gene neighborhood graphical representation.

Scope

To search for genes/proteins inside the genome of interest use ‘Scope’ field on the ‘Genes/Proteins’ search page. When a genome name or an identifier is entered in the field called ‘Scope’ on the gene/protein search page, a list of corresponding organisms appears. Clicking on one of them will set it as a genome to search for specified genes and proteins in. The scope can also be set on a genome detail page.

Cart

Genomes and genes can be added to the cart and analyzed in detail and encoded protein sequences can be downloaded. The download format is fasta. A protein title includes RefSeq protein Id, gene locus, MiST stable Id, protein annotation and organism name. Genomes and genes added to the cart are marked on the search page to help keep track of the added items. The items added to the cart are stored there for 30 days. Currently up to 500 items can be added to the cart.

RESTful API

The API provides (i) programmatic access to all the data using a variety of identifiers and parameters and (ii) large-scale analysis of bacterial and archaeal signal transduction systems. Another function of API is flexibility of interaction with the data. The requested data is returned in JSON format. A well-documented description of the MiST database data structure together with the detailed query examples in several popular programming languages is given on the API page.

References

1. Gumerov, V.M., Ortega, D.R., Adebali, O., Ulrich, L.E., Zhulin, I.B. (2020) MiST 3.0: an updated microbial signal transduction database with an emphasis on chemosensory systems. Nucleic Acids Research, 48: D459-D464.

2. Wuichet, K. and Zhulin, I.B. (2010) Origins and diversification of a complex signal transduction system in prokaryotes. Science signaling, 3, ra50.

3. Alexander, R.P. and Zhulin, I.B. (2007) Evolutionary genomics reveals conserved structural determinants of signaling and adaptation in microbial chemoreceptors. Proceedings of the National Academy of Sciences of the United States of America, 104, 2885-2890.

4. Ortega, D.R. and Zhulin I.B. (2018) Phylogenetic and Protein Sequence Analysis of Bacterial Chemoreceptors. in Bacterial Chemosensing: Methods and Protocols (ed. Manson, M. D.), Springer New York, 373–385, doi:10.1007/978-1-4939-7577-8_29.

Help

Code of Conduct

Reporting Guide