MiST2 logo

Agfam signaling domain library

A central goal of MiST2 is to accurately and comprehensively identify signal transduction proteins. This is accomplished by searching for significant matches to specific domain profiles, most of which are from the Pfam database. Many of these external profiles are built from unrepresentative and poorly constructed multiple sequence alignments. Despite these shortcomings, they perform acceptably for many applications such as large-scale genome annotation projects; however, they also fail to accurately and specifically identify all true domain members. Thus, we have begun constructing high-quality domain profiles of signal transduction domains and collectively refer to this internal domain library as Agfam.

Domain model construction

Agfam domain models are built using a semi-automatic, two-stage approach. The first stage consists of building a core set of homologous sequences using exhaustive PSI-BLAST searches seeded with bona fide members (those that have known 3D structures) and using a stringent E-value threshold. These are then clustered using the Markov Cluster Algorithm (MCL), aligned, and manually edited. The second stage identifies remote homologs by PSI-BLASTing representative members from the core homolog set with a relaxed E-value threshold. Additional measures including secondary structure are used to evaluate the relatedness of distantly related homologs and filter out false-positives. The final set of homologs (core and remote) are subjected to clustering, aligning, and manual tweaking and result in one or more groups of domain sequences from which one or more domain profiles are constructed.

This process may result in several distinct profiles which together model a single domain family, yet individually model specific subgroups within a domain family. In these cases, significant matches are labeled with the name followed by a colon followed by its subgroup name. For example, HK_CA:2 denotes the HK_CA domain family, subgroup 2.

Results

Currently, Agfam has 25 profiles associated with two primary domains, HK_CA (for histidine kinase catalytic domain) and RR (for response regulator), that specifically and sensitively model the transmitter domain of sensor histidine kinases and the receiver domain of response regulators, respectively. Due to incredible conservation, it was possible to align more than 25,000 receiver domains and produce a single profile for the receiver domain, RR. In contrast, the transmitter domain of histidine kinases is much less conserved and resulted in 23 specific subgroup profiles and 1 general subgroup profile. One transmitter subgroup, HK_CA:5, demonstrated remarkable conservation and indeed was solely found in CheA chemotaxis proteins. Because of its association with CheA-type proteins, we labeled this group HK_CA:Che and use it to specifically identify CheA proteins.

Agile Genomics, LLC

Developed and maintained by Agile Genomics, LLC © 2017

Hosted at: UTK || ulrich.luke+sci@gmail.com

Please let us know of any errors, misannotations, or other issues/comments