Agfam signaling domain library
A central goal of MiST2 is to accurately and comprehensively identify signal transduction proteins. This is accomplished by searching for significant matches to specific domain profiles, most of which are from the Pfam database. Many of these external profiles are built from unrepresentative and poorly constructed multiple sequence alignments. Despite these shortcomings, they perform acceptably for many applications such as large-scale genome annotation projects; however, they also fail to accurately and specifically identify all true domain members. Thus, we have begun constructing high-quality domain profiles of signal transduction domains and collectively refer to this internal domain library as Agfam.
Domain model construction
This process may result in several distinct profiles which together model a single domain family, yet individually model specific subgroups within a domain family. In these cases, significant matches are labeled with the name followed by a colon followed by its subgroup name. For example, HK_CA:2 denotes the HK_CA domain family, subgroup 2.
Currently, Agfam has 25 profiles associated with two primary domains, HK_CA (for histidine kinase catalytic domain) and RR (for response regulator), that specifically and sensitively model the transmitter domain of sensor histidine kinases and the receiver domain of response regulators, respectively. Due to incredible conservation, it was possible to align more than 25,000 receiver domains and produce a single profile for the receiver domain, RR. In contrast, the transmitter domain of histidine kinases is much less conserved and resulted in 23 specific subgroup profiles and 1 general subgroup profile. One transmitter subgroup, HK_CA:5, demonstrated remarkable conservation and indeed was solely found in CheA chemotaxis proteins. Because of its association with CheA-type proteins, we labeled this group HK_CA:Che and use it to specifically identify CheA proteins.