We evaluated topological predictions for nine different programs, HMMTOP, TMHMM, SVMTOP, DAS, SOSUI, TOPCONS, PHOBIUS, MEMSAT-SVM (hereinafter referred to as MEMSAT), and SPOCTOPUS. These programs were first evaluated using four large topologically well-defined families of secondary transporters, and the three best programs were further evaluated using topologically more diverse families of channels and carriers. In the initial studies, the order of accuracy was: SPOCTOPUS > MEMSAT > HMMTOP > TOPCONS > PHOBIUS > TMHMM > SVMTOP > DAS > SOSUI. Some families, such as the Sugar Porter Family (2.A.1.1) of the Major Facilitator Superfamily (MFS; TC #2.A.1) and the Amino Acid/Polyamine/Organocation (APC) Family (TC #2.A.3), were correctly predicted with high accuracy while others, such as the Mitochondrial Carrier (MC) (TC #2.A.29) and the K+ transporter (Trk) families (TC #2.A.38), were predicted with much lower accuracy. For small, topologically homogeneous families, SPOCTOPUS and MEMSAT were generally most reliable, while with large, more diverse superfamilies, HMMTOP often proved to have the greatest prediction accuracy. We next developed a novel program, TM-STATS, that tabulates HMMTOP, SPOCTOPUS or MEMSAT-based topological predictions for any subdivision (class, subclass, superfamily, family, subfamily, or any combination of these) of the Transporter Classification Database (TCDB; www.tcdb.org) and examined the following subclasses: α-type channel proteins (TC subclasses 1.A and 1.E), secreted pore-forming toxins (TC subclass 1.C) and secondary carriers (subclass 2.A). Histograms were generated for each of these subclasses, and the results were analyzed according to subclass, family and protein. The results provide an update of topological predictions for integral membrane transport proteins as well as guides for the development of more reliable topological prediction programs, taking family-specific characteristics into account.

Transport proteins function by multiple mechanisms, allowing hydrophilic molecules to cross biological membranes [Lee, 2011]. The simplest of these proteins form pores or channels which allow the free diffusion of molecules from one side of the membrane to the other [Elinder et al., 2007]. Some of these proteins are small peptides that contain only 1 or 2 transmembrane segments (TMSs). In order to form transmembrane pores, these peptides form oligomeric structures with TMSs approximately perpendicular to the plane of the membrane. Others contain many more TMSs, often having arisen by multiplication of a small basic peptide unit with just a few TMSs. The larger number of repeat units minimizes the need for a greater number of subunits necessary to form the pore [Barabote et al., 2006; Pivetti et al., 2003].

Various physical and/or chemical agents often gate the larger proteins but usually not the smaller ones. We have postulated that the simplest of these channel proteins were the primordial systems that gave rise to more complicated channels via intragenic duplication [Saier, 2003]. Two types of channel proteins can be distinguished, one of which exerts its action in the cell that produces it, and the other which targets a cell other than the one that makes it [Saier, 2000]. The latter proteins are toxins that form pores in the membranes of a target organism, releasing nutrients for the predatory organism while killing the target [Fischer et al., 2012; Saris et al., 2009]. Toxins can similarly exist in small and large forms, where simple peptide toxins usually have no more than 1 or 2 TMSs. Although the larger protein toxins may have more, this is not always the case. This is because protein toxins often include protein domains that serve any of a variety of functions, such as subcellular targeting and functional regulation [Barabote et al., 2006].

From larger channel proteins, we have postulated that carriers, capable of recognizing their substrates and shuttling them across the membrane, arose in part as a result of point mutations [Saier, 2003]. In contrast to channel proteins, very few carriers have been documented that exhibit fewer than 4 TMSs. Even in the two or three examples where fewer than 4 TMSs for hypothesized carriers have been suggested, the mode of transport is uncertain. Thus it appears that in order to form a carrier, a larger, more constrained, less oligomeric structure may be required.

In previous studies, we noted the presence of repeat sequences in many secondary carrier proteins [Saier, 2003]. It was this observation that led to the proposed pathway described above. Repeat sequences were initially detected using computer programs that allowed prediction of numbers of TMSs. Several such programs are available. Among these are: HMMTOP (single version) [Tusnady and Simon, 2001], SVMTOP [Lo et al., 2008], TMHMM [Krogh et al., 2001], DAS [Cserzo et al., 2002] SOSUI [Hirokawa et al., 1998], TOPCONS [Bernsel et al., 2009], PHOBIUS [Kall et al., 2004], MEMSAT-SVM (hereafter called MEMSAT) [Nugent and Jones, 2010] and SPOCTOPUS [Viklund et al., 2008]. The authors describing each of these programs have claimed a high degree of accuracy, usually over 90%, with the exceptions of SVMTOP, which has a reported accuracy of over 70%, and HMMTOP, with a reported accuracy of 88.5%. However, seldom have independent research groups confirmed the observations reported by these investigators.

In the present study, we have compared the nine programs mentioned above using several independently evolving families of transport proteins. Initially we examined members of four different well-characterized families, all of known topology, to evaluate the relative accuracies of these nine programs. Using these datasets, we could establish that HMMTOP, SPOCTOPUS and MEMSAT were the top performers for all four families. Consequently, these programs were used to design a novel program, TMStats (http://www.tcdb.org/progs/?tool = tmstats) that provides statistical analyses of integral membrane transport protein topologies. Upon application of TMStats, we found that SPOCTOPUS and MEMSAT commonly performed most accurately when presented with small, topologically homogenous groups of proteins, but that HMMTOP is the most accurate topological prediction program when certain larger, diverse superfamilies of transport proteins are analyzed.

The Transporter Classification Database (TCDB: www.tcdb.org) categorizes all transport systems according to class, subclass, family, subfamily, and protein [Saier et al., 2006, 2009, 2014]. In addition, a hyperlink exists that delineates superfamily relationships among these families. Altogether, TCDB includes over 800 families, many of them included within superfamilies. The novel TMStats program can examine any of these categories or combinations of these categories simultaneously to make topological predictions.

After first evaluating the nine above-mentioned programs with 4 selected families, we conducted studies with whole classes of transport proteins. We examined first, α-helical type channels (TC #1.A), second, small α-helical holin-type channel-forming proteins (TC #1.E) frequently involved in bacteriophage lysis or bacterial programmed cell death, third, pore-forming toxins that insert into membranes of a target organism other than the one that produces it (TC #1.C), and fourth, secondary carriers that shuttle substrates across the membrane in a process that involves major conformational changes coupled to the transport cycle (TC #2.A).

Channels and carriers can be distinguished because the former have turnover rates roughly 1,000-fold higher than those of the latter. While channels are often diffusion-limiting, carriers never are. Only the latter generally exhibit high stereospecificity for their substrates. In this paper we analyze these transport proteins by subclass, family, and protein. Using large datasets, we confirm previous results concerning average numbers of TMSs of the different classes of proteins. We also notice characteristics that distinguish families or superfamilies. The results should allow refinement of evolutionary predictions and guides to mechanistic details mediated by these proteins.

Topological Analyses

Several programs were used for comparative purposes to predict integral membrane transport protein topologies. The nine programs examined were HMMTOP, SVMTOP, TMHMM, DAS, SOSUI, TOPCONS, PHOBIUS, MEMSAT and SPOCTOPUS. HMMTOP (http://www.enzim.hu/hmmtop/html/document.html) is a topology prediction program developed by G.E. Tusnady at the Institute of Enzymology at the University of Hungary; it uses a hidden Markov model to predict the number of TMSs [Tusnady and Simon, 2001]. SVMTOP, developed at the Institution of Information Science, Academia Sinica, Taiwan (http://biocluster.iis.sinica.edu.tw/∼bioapp/SVMtop/about.php) is a program that predicts transmembrane helices using a ‘support vector machine' method that hierarchically classifies TMSs based on inside versus outside loops [Lo et al., 2008]. TMHMM (http://www.cbs.dtu.dk/services/TMHMM-2.0/), developed at the Center for Biological Sequence Analysis in Denmark, uses the hidden Markov model to predict transmembrane helices [Krogh et al., 2001]. DAS (http://mendel.imp.ac.at/sat/DAS/DAS.html) is a dissimilar topology prediction program developed at the Biological Research Center at the Institute of Enzymology, Hungarian Academy of Sciences; it uses a ‘dense alignment surface' algorithm that creates a hydrophobicity profile for the query by comparing it to a predetermined library and scoring matrix (http://mendel.imp.ac.at/sat/DAS/abstract.html) [Cserzo et al., 2002]. SOSUI (http://bp.nuap.nagoya-u.ac.jp/sosui/) is a topology prediction program developed by the Mitaku Group in the Department of Applied Physics at Nagoya University. The batch version of this program was used [Hirokawa et al., 1998]. TOPCONS (http://topcons.cbr.su.se/) was developed at Stockholm University, and uses multiple topology prediction algorithms to generate a consensus prediction [Bernsel et al., 2009]. PHOBIUS (http://phobius.sbc.su.se/) was created at the Center for Genomics and Bioinformatics in the Karolinska Institute in Stockholm, Sweden [Kall et al., 2004]. MEMSAT (http://bioinf.cs.ucl.ac.uk/psipred/) uses an improved support vector machine model relative to SVMTOP that was developed by the Department of Computer Science: Bioinformatics Group at the University College London [Nugent and Jones, 2010]. Finally, SPOCTOPUS (http://octopus.cbr.su.se/) was developed at Stockholm University and uses the OCTOPUS algorithm to detect TMSs and a signal peptide prediction algorithm to detect signal peptides; OCTOPUS uses a combination of hidden Markov models and neural networks along with a BLAST search to generate a sequence profile that is annotated with transmembrane properties [Viklund et al., 2008].

In order to set a standard for accuracy, Average Hydropathy, Amphipathicity and Similarity (AveHAS) plots were generated. Because the input file for the AveHAS program is a multiple alignment file produced by the ClustalX program, the results for many proteins are averaged, giving plots that provide much greater predictive accuracy than is possible with individual sequences [Zhai and Saier, 2001a]. The multiple alignments used in the generation of these plots can be found as supplementary materials on our website (http://www.biology.ucsd.edu/∼msaier/supmat/TMStats/index.html).

Another program, called WHAT (Web-based Hydropathy and Amphipathicity) allows the generation of hydrophobicity plots for single sequences and provides TMS predictions using HMMTOP; the input for this program is the query sequence in FASTA form. Usage of the WHAT program allows topological verification by counting the number of hydrophobic peaks that represent potential TMSs [Zhai and Saier, 2001b].

Comparison of Nine Topological Prediction Programs Using Proteins from Four Different Superfamilies

In our initial studies, we chose to compare frequently used methods of integral membrane topological prediction using four moderately sized families of transport systems for which the topologies have been established experimentally. These families are (1) the Sugar Porter Family (TC #2.A.1.1) of the Major Facilitator Superfamily (MFS), members of which have 12 experimentally established TMSs [Pao et al., 1998], (2) the Amino Acid-Polyamine-Organocation (APC) Family (TC #2.A.3) within the APC Superfamily, members of which have 10, 12, 14 or 15 established TMSs [Jack et al., 2000], (3) the Mitochondrial Carrier (MC) Family (TC #2.A.29) within the MC Superfamily, members of which have 6 established TMSs [Palmieri, 2013] and (4) the Potassium Transporter (Trk) Family (TC #2.A.38) within the VIC Superfamily, members of which have 8 established TMSs [Kato et al., 2001; Zeng et al., 2004]. The data in this section was obtained on 3/11/2012.

The nine programs examined are listed in table 1. 84 proteins, derived from TCDB, were the test set used for the Sugar Porter Family within the MFS. These proteins were multiply aligned (ClustalX), and the average topology for these 84 proteins, based on the AveHAS, program is shown in figure 1a. By averaging the results for these proteins, the prediction of topology becomes clear. The peaks of hydropathy corresponding to TMSs are labeled 1 through 12. As is established for the MFS [Guan et al., 2006], these proteins consist of two halves, each of which contains six TMSs [Pao et al., 1998]. The picture obtained by averaging the hydropathy predictions for these sequences is much more clear than when the individual protein hydropathy plots, using the WHAT program, were displayed. The peaks of hydropathy shown in the top panel correspond to the peaks represented by vertical lines in the bottom panel, and also correspond to peaks of similarity as shown by the dashed line in the bottom panel. This plot agrees with our general observation that the TMSs in integral membrane transport proteins are better conserved than the hydrophilic loop regions between them. It also confirms experimental data, including X-ray crystallographic studies, showing that these proteins have 12 TMSs [Guan et al., 2006].

Table 1

Comparison of nine topology prediction algorithms to evaluate prediction accuracy

Comparison of nine topology prediction algorithms to evaluate prediction accuracy
Comparison of nine topology prediction algorithms to evaluate prediction accuracy

Fig. 1

Average hydropathy, amphipathicity and similarity plots using the AveHAS program for (a) the Sugar Porter Family in the MFS, (b) the APC family in the APC Superfamily, (c) the Mitochondrial Carrier (MC) Family within the MC Superfamily, (d) the Trk Family in the VIC Superfamily (see TCDB). The alignment position is recorded on the x-axis.

Fig. 1

Average hydropathy, amphipathicity and similarity plots using the AveHAS program for (a) the Sugar Porter Family in the MFS, (b) the APC family in the APC Superfamily, (c) the Mitochondrial Carrier (MC) Family within the MC Superfamily, (d) the Trk Family in the VIC Superfamily (see TCDB). The alignment position is recorded on the x-axis.

Close modal

The data presented in table 1A reveals the number of proteins predicted to have anywhere between 6 and 13 TMSs. For the MFS, the HMMTOP program predicted 92% of the proteins (77 proteins) having 12 TMSs; the remaining 8% (7 proteins) were predicted to have either 10 or 11 TMSs. Examination of the plots for these 7 proteins revealed that HMMTOP missed one or two of the TMSs for each protein. The second program, listed in table 1, SVMTOP, proved to be much less reliable, with only 49 proteins predicted to have 12 TMSs. The others were predicted to have 10, 11 or 13 TMSs. The third program, TMHMM, predicted 54 proteins to have 12 TMSs, and the exceptions had anywhere from 8 to 11 TMSs. The DAS program predicted only 36 proteins to have 12 TMSs, with large numbers of proteins predicted to have 7 through 11 and 13 TMSs. The SOSUI program predicted only 29 proteins to have 12 TMSs, with the others having anywhere from 6 to 13 TMSs. TOPCONS predicted 75 proteins to have 12 TMSs, with the remainder of the proteins having between 9 and 11 TMSs. PHOBIUS performed most poorly, predicting 0 proteins to have 12 TMSs, and predicted the majority of proteins to have either 9 or 10 TMSs. MEMSAT predicted 78 proteins to have 12 TMSs, with a total of 6 proteins having either 9, 10, or 11 TMSs. Finally, SPOCTOPUS was the top performer, predicting 80 proteins to have 12 TMSs, 1 protein to have 9 TMSs, and 3 proteins to have 11 TMSs. Thus, the top performers were SPOCTOPUS, MEMSAT and HMMTOP, predicting 77-80 of 84 proteins correctly.

The average hydropathy plot for the APC family (91 proteins derived from TCDB) is shown in figure 1b. Of the 91 proteins included in this study, 83 are believed to have 12 TMSs, four have 10 TMSs, six have 14, and one has 15 TMSs (see section on the APC Superfamily below). The average hydropathy plot revealed 12 well-conserved peaks of hydropathy as expected for the dominant members of this family.

Examination of table 1B reveals that TOPCONS predicts the largest number of proteins to have 12 TMSs, but HMMTOP appears to have the best overall prediction accuracy as it found sixty-nine 12 TMS proteins, and correctly predicted the four 10 TMS proteins as well as the six 14, and one 15 TMS proteins. With regard to the APC family, the order of correct predictions was HMMTOP > TOPCONS > PHOBIUS > MEMSAT > SPOCTOPUS > TMHMM > SVMTOP > DAS > SOSUI.

Members of the MC Superfamily are known to have 6 TMSs with no reported topological variations. Examination of the hydropathy plots for the 88 proteins derived from TCDB and included in this study revealed that, with one exception, all could be interpreted as having 6 TMSs. However, these plots were usually ambiguous in contrast to the MFS and APC superfamilies. SPOCTOPUS was the strongest predictor, with 79 out of 88 proteins (89%) predicted correctly. MEMSAT produced the next best results, predicting the correct topology for 68/88 proteins (77%). HMMTOP predicted 23 proteins to have 6 TMSs, with large numbers predicted to have fewer than 6 TMSs. Only two were predicted to have 7 TMSs. Thus, only 26% of these proteins were correctly predicted. By contrast, very few proteins were predicted to have 6 TMSs by any of the other six programs used (table 1C). It is clear that while SPOCTOPUS and MEMSAT predicted the correct topology for these proteins well, the other programs did extremely poorly.

While hydropathy plots for the individual proteins in the MC Superfamily were often confusing, the use of the AveHAS program to generate average hydropathy and similarity plots for the 88 proteins gave clear results as shown in figure 1c. Here, one can see that 6 TMSs are predicted, where TMSs 1, 3 and 5 are high sharp peaks, while 2, 4 and 6 are lower and broader. This pattern reflects the presence of three 2 TMS repeat units present in all mitochondrial carriers. Each of these 6 peaks is well conserved. These results again illustrate the advantage of using the AveHAS program for topological predictions.

Potassium transporters of the Trk Family (20 proteins from TCDB included in this study) were predicted less accurately than the mitochondrial carriers. These proteins are known to have 4 repeat units derived from the channel-forming element of members of the voltage-gated ion channel superfamily [Kato et al., 2001; Lo et al., 2008; Zeng et al., 2004]. In fact, the Trk Family is a constituent member of the VIC Superfamily (see TCDB superfamilies: http://tcdb.org/superfamily.php). These channels consist of 2 TMSs with a central semihydrophobic P-loop that dips into the membrane but does not traverse it. This topology was not readily apparent when individual proteins were examined with the WHAT program, but the situation was much clearer when the average hydropathy plot was displayed. This plot revealed four quadrants, each with 2 hydrophobic peaks separated by a small semipolar peak. Odd-numbered peaks, as indicated in figure 1d (the first TMS in each repeat unit) are sharp and high, while even-numbered peaks (the second TMS in each repeat unit) are broader and lower, the same pattern noted above for MC Family members. The P-loop is apparent in all four quadrants. This plot again reveals the greater predictive capabilities observed when many proteins are averaged to give a hydropathy plot.

The predictions obtained for the Trk Family using the nine different programs are summarized in table 1D. The majority of the programs predicted fewer than 10 of the 20 proteins to have 8 TMSs, with SPOCTOPUS being the only exception, predicting 10 proteins to have 8 TMSs; MEMSAT was a close second, predicting 7 proteins correctly. In contrast to the MC Family discussed above, virtually all mispredictions were overpredictions (except in the cases of SPOCTOPUS and MEMSAT). These overpredictions resulted because some or all of the P-loops were counted as TMSs. The correct number of predictions was therefore at best only 50%, or less for all remaining programs. In fact, all programs predicted the average total number of TMSs to be between 9.7 and 10.8. Thus, in this case, none of the programs proved to provide accurate predictions. This can be explained by the fact that all of them predict at least some of the P-loops to be transmembrane.

Summarizing, for the MFS and APC superfamilies, SPOCTOPUS, MEMSAT and HMMTOP are the most reliable programs for topological predictions, while TOPCONS follows. The other five programs are less reliable. However, reliability with any program depends upon the family of proteins being analyzed. Some families, such as the MC Family, were poorly predicted by most programs, and all programs poorly predicted the Trk Family. It is imperative that improved prediction methods be developed.

As noted above, the results presented in this section reveal that in general, SPOCTOPUS, MEMSAT and HMMTOP are the most reliable programs available for predicting the topologies of integral membrane transport proteins. For that reason, we chose to use these programs for the quantitative evaluation of predictions for various TC classes, subclasses, superfamilies, families, subfamilies, or any combination of these using the integrated TMStats program (see Methods section). While using three programs theoretically should afford the most comprehensive prediction coverage, we show that SPOCTOPUS and MEMSAT mostly produce the more accurate results when certain small, topologically similar families of proteins are considered, but HMMTOP produces the most accurate predictions when considering certain large families including several superfamilies.

Transmembrane α-Helical Channels (Subclass 1.A)

Subclass 1.A includes channel-forming proteins that consist primarily of α-helical TMSs. The entire subclass was analyzed collectively for topological types using TMStats with the three topology prediction programs, HMMTOP, MEMSAT and SPOCTOPUS, both with and without auxiliary proteins. Including all auxiliary proteins, a total of 916 proteins were analyzed, and without these auxiliary proteins, there were 820 proteins (5/30/2013). The average numbers of TMSs, including auxiliary proteins, were 5.5 ± 4.9 SD using HMMTOP, 5.4 ± 5.3 SD using MEMSAT, and 5.1 ± 5.0 SD using SPOCTOPUS; without the auxiliary proteins, the averages were 5.9 ± 5.0, 5.7 ± 5.4, and 5.4 ± 5.1 SD, respectively. In the analyses reported below, only the results obtained when auxiliary proteins were excluded are reported.

A plot of topological types with frequencies of occurrence on the Y-axis and the numbers of TMSs on the X-axis revealed the distributions using all three programs (fig. 2). There were more 2 than 1 TMS proteins, more 4 than 3 TMS proteins, and more 6 than 5 TMS proteins, showing that even-numbered channel-forming proteins are favored. This was true regardless of which of the three programs were used (fig. 2). This observation reinforces the conclusion of an earlier publication using a much smaller dataset [Saier, 2003]. The prevalence of even-numbered proteins can be explained by the fact that duplication of any number of TMSs gives rise to proteins with even numbers of TMSs, and most transport proteins have arisen via pathways involving intragenic duplication. Surprisingly, however, there are substantially more 11 than 10 TMS proteins, regardless of the program used. Of the proteins predicted to have 11 TMSs, the majority proved to belong to the Amt channel family (1.A.11) regardless of the program used. In fact, the two Amt channel proteins for which high-resolution X-ray structures are available display 11 established TMSs, and most members of this family are predicted to have 11 TMSs. Of the remaining 11 TMS proteins, HMMTOP and MEMSAT predict the cholesterol/dsRNA uptake (CUP) family (1.A.79) to include the most such proteins. This is in accordance with one of the two models proposed for the topology of these proteins. Examining the larger proteins, those predicted to have 16-25 TMSs, even-numbered proteins are more numerous than odd-numbered proteins, almost without exception.

Fig. 2

Comparative distribution of topological types predicted using the TMStats program for HMMTOP in black, MEMSAT in white and SPOCTOPUS in grey, for the proteins included in subclass 1.A of TCDB as of 5/29/2013.

Fig. 2

Comparative distribution of topological types predicted using the TMStats program for HMMTOP in black, MEMSAT in white and SPOCTOPUS in grey, for the proteins included in subclass 1.A of TCDB as of 5/29/2013.

Close modal

The absolute numbers of proteins having various predicted topologies in subclass 1.A are also indicated in figure 2. HMMTOP predicted 31 proteins to have 0 TMSs, 48 proteins to have 1 TMS, 97 with 2 TMSs, 75 with 3 TMSs, 127 with 4 TMSs, 60 with 5 TMSs, 142 with 6 TMSs, and 78 with 7 TMSs. MEMSAT predicted only 5 proteins to have 0 TMSs, 76 proteins predicted to have 1 TMS, 132 to have 2, 67 to have 3, 131 to have 4, 101 to have 5, 112 to have 6, and 36 to have 7 TMSs. SPOCTOPUS predicted 39 proteins to have 0 TMSs, 41 proteins to have 1, 117 proteins to have 2, 86 proteins to have 3, 144 proteins to have 4, 104 proteins to have 5, 134 proteins to have 6, and 27 to have 7 TMSs. From these numbers it is clear that HMMTOP and SPOCTOPUS reasonably agree, while MEMSAST greatly underpredicts 0 TMS proteins while overpredicting 1 and 2 TMS proteins. This error when using MEMSAT will be discussed below.

Only about 20% of the proteins had more than 7 TMSs regardless of the prediction program used. These observations confirm our earlier conclusion that a majority of channel-forming proteins are small with few TMSs, while carriers and primary active transporters are larger with more TMSs [Saier, 2003] (see below).

0 TMS Channels

We first analyzed the proteins predicted to have 0 TMSs. With HMMTOP, 14 families were represented among the 31 proteins in this category. Of these, 3 families had 4-5 members each predicted to have 0 TMSs: the Intracellular Chloride Channel (CLIC) Family (1.A.12), the Annexin (Annexin) Family (1.A.31), and the Nucleotide-Sensitive Anion-Selective Channel (ICln) Family (1.A.47). Three members of the Epithelial Chloride Channel (E-ClC) family and the Poliovirus 2B Viroporin (2B Viroporin) family were represented. Two members of the Brain Acid-Soluble Protein Channel (BASP1 Channel) Family (1.A.71), and the Mitochondrial EF Hand Ca2+ Uptake Porter/Regulator (MICU) Family (1.A.76) were also in this category. Each of the remaining proteins predicted to have 0 TMSs was the only member of its respective family included in TCDB at the time of these studies. Protein families known to be bifunctional, with one function associated with a soluble form and the other function associated with the membrane-integrated channel-forming form, are listed in table 2. Using SPOCTOPUS, the CLIC family had 9 proteins predicted to have 0 TMSs, while 4 members each from the E-ClC, Annexin, and ICln families were represented. Three proteins each from the Cation Channel-forming Heat Shock Protein-70 (HSP-70) family and the MICU family lacked predicted TMSs, and 2 proteins from the 2B Viroporin family were represented. Members of the Cation-Selective Channel-forming Heat Shock Protein-70 (Hsp70) Family (1.A.33) are predicted to have 0 or 1 TMSs per polypeptide chain. These proteins are normally present as soluble chaperone proteins, but they apparently can insert into the membranes of eukaryotes to form cation-selective channels [Arispe and De Maio, 2000]. This is another example of proteins that can exist in either soluble or membrane-integrated forms (table 2). Each of the remaining proteins predicted to have 0 TMSs was the only member of its respective family included in TCDB at the time of these studies.

Table 2

Families of bi- or multifunctional proteins that exist in both soluble and membrane-integrated channel-forming states

Families of bi- or multifunctional proteins that exist in both soluble and membrane-integrated channel-forming states
Families of bi- or multifunctional proteins that exist in both soluble and membrane-integrated channel-forming states

The CLIC family includes proteins that have dual functions; first, they are soluble glutathione-S-transferases, and second, they have the capacity to insert into the membrane to form channels. One of these proteins (1.A.12.3.1) is the bacterial CLIC homologue, stringent starvation protein A, SspA of Escherichia coli. Five out of 8 CLIC family members were tabulated as being 0 TMS proteins by HMMTOP, while the other 3 are predicted to have 1 TMS. SPOCTOPUS predicted all 8 to lack TMSs while MEMSAT predicted all to have 1 TMS. Although the topology of the membrane-integrated form is not yet known, the ambiguous nature of these proteins presumably reflects their ability to exist both in soluble and membrane-integrated forms.

Annexins similarly have 0 putative TMSs according to HMMTOP and SPOCTOPUS, but 1 TMS by MEMSAT. Annexins, structurally conserved, mediate reversible Ca2+-dependent intracellular membrane/phospholipid binding. Like CLIC family members, these proteins can exist in both soluble and membrane-associated forms. Membrane association is critical for their proposed functions that include vesicle trafficking, membrane repair, membrane fusion and ion channel formation [McNeil et al., 2006].

A fourth family with members exhibiting 0 TMSs by HMMTOP and SPOCTOPUS, but not MEMSAT, is the ICln Family (1.A.47). ICln proteins are multifunctional proteins in animals, being essential for cell volume regulation. They are found in the cytosol but are also associated with the cell membrane. They regulate cell volume by activating a swelling-induced Cl- conductance pathway. ICln reconstituted in artificial bilayers forms ion channels [Ritter et al., 2003]. Cell swelling causes ICln to redistribute from the cytosol to the cell membrane. The coexistence of these proteins as both soluble and membrane-integrated forms again explains the prediction that they exhibit no TMSs.

Like members of the 3 families described above, members of the BASP1 Family (1.A.71), lack observed hydrophobic peaks in hydropathy plots, and again, while HMMTOP and SPOCTOPUS predicted 0 TMSs, MEMSAT predicted 1. These proteins become membrane-associated by virtue of myristoylation and show cation-selective ion channel activity in artificial membranes [Ostroumova et al., 2011]. Thus, the majority of the BASP1 proteins predicted to have 0 TMSs by HMMTOP and SPOCTOPUS exist as soluble proteins that can insert into membranes as a result of lipid derivatization. In summary, and in accordance with other results, HMMTOP and SPOCTOPUS more reliably predict topologies with 0-3 TMSs compared to MEMSAT. Several soluble proteins can insert into membranes to form channels. However, the configuration of the polypeptide chains in association with membranes, in general, is not known.

1 TMS Channels

HMMTOP predicted 48 proteins to have 1 TMS, MEMSAT predicted 76 proteins to have 1 TMS, and SPOCTOPUS predicted 41 proteins to have 1 TMS. We found that 0 TMS proteins were consistently predicted to have 1 TMS by MEMSAT, but not by HMMTOP or SPOCTOPUS (see above). All 6 members of the Phospholemman (PLM) Family (1.A.27) in TCDB were predicted to have a single TMS using HMMTOP, but using SPOCTOPUS, 4 members were predicted to have 1 TMS, and 2 were overpredicted to have 2 TMSs. These proteins are known to have a single TMS [Cheung et al., 2013; Kowdley et al., 1997; Moorman et al., 1995] and function in a variety of capacities, both as regulators of Na+,K+-ATPases and as anion-selective channels (table 3) [Geering, 2006; Nilius et al., 1996].

Table 3

Bifunctional proteins that can form integral membrane channels

Bifunctional proteins that can form integral membrane channels
Bifunctional proteins that can form integral membrane channels

Bcl-2 proteins (1.A.21), involved in both necrosis and apoptosis, play both death and anti-death roles in higher eukaryotes [Arbel and Shoshan-Barmatz, 2010]. These proteins may have a single C-terminal TMS that serves to anchor them to the membrane, but all three programs predicted more 2 TMS proteins than 1 TMS proteins. Like phospholemmans and their homologues, all members appear to have very similar topologies (table 3).

The Colicin Lysis Protein (CLP) Family (1.A.73) [Cavard, 2002; Chen et al., 2011] consists of 3 members, all of which have a single N-terminal TMS using HMMTOP and MEMSAT; with the SPOCTOPUS algorithm, 1 member is underpredicted to have 0 TMSs, while the remaining 2 are predicted to have 1 TMS. In the case of all channel-forming proteins having a single TMS, one can predict that formation of the channel depends upon the formation of oligomeric structures, either homo- or heterooligomers. As noted above, single TMS channel-forming peptides are common, especially within TC subclasses 1.C (pore-forming toxins) and 1.E (holins). Many pore-forming families consist of members that are single peptides of less than 100 residues with a single TMS. They are from viruses and a wide variety of organisms from bacteria to man.

2 TMS Channels

97 proteins were predicted to have 2 TMSs using HMMTOP, 132 proteins with MEMSAT and 117 using SPOCTOPUS. Families with multiple 2 TMS members will be discussed. The first of these is the Voltage-gated Ion Channel (VIC; 1.A.1) Family within the VIC Superfamily. The channel is formed by tetramers of 2 TMS subunits, each separated by a well-conserved P-loop. The 2 TMS members of the VIC Superfamily retrieved in this search were all of this type. Several members of the VIC family and the Inward Rectifier K+ Channel (IRK-C) Family (1.A.2) are predicted to have 3 TMSs. When 3 TMSs are predicted, the moderately hydrophobic P-loop is predicted to be transmembrane, thus explaining the erroneous prediction. Out of 17 proteins in the IRK-C family, HMMTOP predicts only 1 to have 2 TMSs, with 12 proteins predicted to have 3 TMSs; in contrast, both MEMSAT and SPOCTOPUS correctly predicted all 17 proteins to have 2 TMSs. Ion channels of both families can be homo- or heterooligomeric tetrameric structures.

The Epithelial Na+ Channel (ENaC) Family (1.A.6) and the ATP-gated P2X Receptor Cation Channel (P2X Receptor) Family (1.A.7) are members of a single superfamily, and all members of both families have 2 TMSs separated by a large hydrophilic extracytoplasmic domain. They are involved in Na+ and Ca2+ transport. These channels generally exhibit heterotetrameric architectures. Protein members of this superfamily all exhibit the same apparent topology, each with N- and C-termini on the inside of the cell and two amphipathic transmembrane spanning segments, M1 and M2 [Gonzales et al., 2009].

The Mer Superfamily (1.A.72) can be split into 5 families including MerC, MerE, MerH, MerP, and MerT. All 5 families show sequence similarity within TMSs 1 and 2, but TMSs 3 and 4, when present, are either non-homologous or arose by an intragenic duplication event [Mok et al., 2012; Yamaguchi et al., 2007]. These channels all catalyze uptake of Hg2+ into bacterial cells in preparation for reduction by mercuric reductase, MerA.

Additional families that exhibit 2 putative TMSs are the Non-Selective Cation Channel-2 (NSCC2) Family (1.A.15), the Chloroplast Envelope Anion Channel-forming Tic110 (Tic110) Family (1.A.18), the Bcl-2 (Bcl-2) Family (1.A.21) and the CorA Metal Ion Transporter (MIT) Family (1.A.35). The 2 TMSs in most of these families are in close proximity to one another. An X-ray structure for the E. coli CorA protein has established the 2 TMS topology. The Membrane Mg2+ Transporter (MMgT) Family (1.A.67) includes members that all have 2 TMSs.

3 TMS Channels

75 proteins in TCDB were predicted by HMMTOP to have 3 TMSs, 67 by MEMSAT, and 86 by SPOCTOPUS. Within the Bacterial Flagellar Motor/Outer Membrane Transport Energizer (MotAB-ExbBD) Superfamily (1.A.30), 3 were predicted to have 3 TMSs while 5 proteins were predicted to have 4. In fact, MotA members of the MotAB family have 4 established TMSs while the homologous ExbB and TolQ proteins have 3 TMSs. In the latter proteins, the 3 TMSs correspond to TMSs 2-4 in the former proteins [Yonekura et al., 2011].

The Ctr family of copper channels (1.A.56) probably exhibits a uniform topology which is however difficult to predict. The hydropathy plot reveals 2 hydrophobic peaks, the second of which is broad. This peak is predicted to include 1 or 2 TMSs, depending on the protein, but the 3 TMS topology is favored with 2 TMSs predicted near the C-termini. These eukaryotic proteins can trimerize and harbor a putative copper-binding M-XC-XM-XM motif near their N-termini that is essential for function [Banci et al., 2010; Dumay et al., 2006; Petris, 2004].

4 TMS Channels

Some members of the VIC family contain 4 TMSs per polypeptide chain, and 2 such proteins form homodimeric channels with 4 channel-forming units and a total of 8 TMSs per channel. Insufficient sequence similarities make recognition of the P-loops difficult. As in other members of the VIC Superfamily, these P-loops play important roles in ion-selectivity and ion flux control.

14 out of 23 proteins in the Neurotransmitter Receptor, Cys loop, Ligand-Gated Ion Channel (LIC) Family (1.A.9) display a correctly predicted 4 TMS topology using HMMTOP, 22/23 using MEMSAT, and 11/23 using SPOCTOPUS. The hydropathy plots reveal 4 narrow peaks, 2 of them close to each other, 1 lone TMS at the N-termini, and another lone TMS at the C-termini. Members of this family have a ligand-binding domain with a number of key residues that are conserved [Connolly, 2008]. The five subunits are arranged in a ring with their ‘M2' transmembrane helical spanners lining the central channel. They come together in the middle of the membrane to form the channel gate, and the gate opens upon binding acetylcholine or another ligand [Thompson and Williamson, 2010].

Another family that presents a 4 TMS topology is the gap junction-forming Connexin Family (1.A.24). The hydropathy plot suggests a 2 TMS duplication, creating the 4 TMS display. The channels consist of clusters of closely packed pairs of connexins through which small molecules diffuse between neighboring cells. Connexins consist of homo- or heterohexameric arrays of connexins, and the connexin in one plasma membrane docks end-to-end with another connexin in the membrane of a closely opposed cell [Maeda et al., 2009]. The connexin 4 TMS topology is well established.

Similar to members of the Connexin family, gap junction-forming Innexin Family (1.A.25) members are predicted and are known to have 4 TMSs. These proteins form intercellular gap junctional channels primarily in invertebrates that allow electrical coupling and free flow of small molecules between cells. As for the connexins, a 2 TMS duplication probably gave rise to the 4 TMS proteins. HMMTOP and MEMSAT sometimes erroneously predicted a fifth TMS. HMMTOP, in this case the least accurate program, predicted 8 proteins to have an extra TMS, but SPOCTOPUS, the most accurate program for this family, correctly predicted a 4 TMS topology for all family members.

The H+- or Na+-translocating Bacterial Flagellar Motor (Mot) Family (1.A.30.1) includes 5 out of 6 TC entries with an established 4 TMS topology, correctly predicted by HMMTOP and MEMSAT. SPOCTOPUS correctly predicted all 6 members of the subfamily to have a 4 TMS topology. The hydropathy plot revealed 2 broad TMSs at both ends of these proteins with a loop in between. These flagellar motor proteins contain clusters of charged residues at both termini, promoting non-covalent interactions between the two components of these motors, MotA and MotB.

Members of the Ca2+ Release-Activated Ca2+ (CRAC) Channel Family (1.A.52) also exhibit a 4 TMS topology. Hydropathy plots predict 4 TMS proteins with large loops between TMSs 3 and 4. When antigens stimulate immune cells, they trigger Ca2+ entry through these tetrameric channels that stimulate the immune response to pathogens. CRAC channel proteins exhibit a teardrop shape, each with a long, tapered cytoplasmic domain. These channels consist of tetramers formed upon Stim-induced dimerization of the Orai subunit [Matias et al., 2010].

Proteins in the Synaptic Vesicle-Associated Ca2+ Channel ‘Flower' Family (1.A.55) were predicted to have 3 or 4 TMSs. Synaptic vesicles promote neurotransmission in presynaptic terminals, regulated by Ca2+ [Yao et al., 2009]. The hydropathy plots for these proteins show 2 major broad peaks. The first of these peaks is always predicted to consist of 2 TMSs, but the second peak is sometimes predicted to be 1 and sometimes 2 TMSs. One of the family members (Flower) has been shown to have 4 TMSs [Yao et al., 2009].

5 TMS Channels

60 proteins were predicted to have 5 TMSs using HMMTOP, 101 using MEMSAT, and 104 using SPOCTOPUS. The family with multiple proteins predicted to have 5 TMSs will be discussed in this section.

The single channel family with multiple proteins predicted to have 5 TMSs is the ‘Tweety' Anion Channel Family (1.A.48), a recently identified family of channel proteins found in animals and plants. Three out of the five TC entries in the family appear to have a 5 TMS topology with HMMTOP, and all 5 members of the family are predicted to have 5 TMSs with both SPOCTOPUS and MEMSAT. These proteins contain 5 (or 6) TMSs in a probable arrangement: 2 + 2 + 1, with an extra N-terminal TMS present in some plant homologues. They produce large conductance chloride currents [He et al., 2008].

6 TMS Channels

142 proteins were predicted by HMMTOP to have 6 TMSs, 112 using MEMSAT and 134 using SPOCTOPUS, making this the largest topological type in class 1.A. Among the proteins predicted to have 6 TMSs, 2 families predominate: the VIC Family and the Major Intrinsic Protein (MIP) (1.A.8) Family. Using HMMTOP, the VIC Superfamily includes 32 TC entries predicted to have 6 TMSs and 15 to have 5. Most or all of the latter were incorrectly predicted and actually have 6. Most of them are K+ channels, and they usually consist of homotetrameric structures. Many voltage-sensitive K+ channels function with subunits that modify K+ channel gating. Non-integral subunits can be homologous to oxidoreductases that co-assemble with the tetrameric channel-forming subunits [Norris et al., 2010].

Ryanodine-Inositol 1,4,5-Triphosphate Receptor Ca2+ Channel (RIR-CaC) Family (1.A.3) members have either a 6 or an 8 TMS predicted topology. They are usually homotetrameric complexes. Pore-forming P-loop sequences occur between the fifth and sixth TMSs as for 6 TMS members of the VIC family. The ryanodine channels function in the release of Ca2+ from intracellular storage sites in animal cells, thereby regulating various Ca2+-dependent physiological processes. They are members of the VIC Superfamily [Chang et al., 2004; Du et al., 2002].

Seven proteins from the Transient Receptor Potential Ca2+ Channel (TRP-CC) Family (1.A.4) present a topology with 6 putative TMSs using HMMTOP, 14 members with MEMSAT and 21 members using SPOCTOPUS. The topological prediction varies with the most common being 5 TMSs. Nevertheless, they all probably have 6 TMSs. This family can be divided into 7 subfamilies that all share a common Ca2+ (cation) channel function. As cellular sensors, TRP channels are activated by a variety of different stimuli and function as signal integrators [Latorre et al., 2009].

The VIC family is the dominant family predicted to have 7 TMSs with 31 proteins of the 78 proteins in the category using HMMTOP; SPOCTOPUS and MEMSAT predict 1 and 2 members respectively to have 7 TMSs. However, almost all these predictions are overpredictions. The P-loop between TMSs 5 and 6 is counted as a TMS, erroneously predicting 7 instead of 6 TMSs, especially by HMMTOP.

The Major Intrinsic Protein (MIP) Family of aquaporins and glycerol facilitators (1.A.8) has 55 of 68 members correctly predicted to have 6 TMSs in a 3 + 3 arrangement due to a single intragenic duplication event [Park and Saier, 1996] using HMMTOP, and 64/68 using MEMSAT and SPOCTOPUS. Two proteins were predicted to have 5 TMSs, in one case because TMS 1 was missed, and in the other because TMS 3 was missed; 10 proteins were predicted to have 7 TMSs using HMMTOP. One of these proteins, the Major Intrinsic Protein (MIP), makes up about 60% of the proteins in the lens of the eye. During lens development, MIP becomes proteolytically truncated. These truncated tetramers form intercellular adhesive junctions, yielding a crystalline array that mediates lens formation [Gonen and Walz, 2006].

The Glutamate-Gated Ion Channel (GIC) Family of Neurotransmitter Receptors (1.A.10), members of the VIC Superfamily, have a topology with 1 TMS at the N-termini and the remaining 5 TMSs near their C-termini. The extracellular amino-terminal domain, S1, and the large extracytoplasmic loop domain between TMSs 2 and 3, bind the neurotransmitter, which regulates channel formation and ion selectivity [Gouaux, 2004]. There are three types of GIC receptors [Mayer, 2006]. HMMTOP predicted 4 proteins to have 6 TMSs, while MEMSAT and SPOCTOPUS predicted 0 proteins to have 6 TMSs. When analyzing mispredictions, HMMTOP predicted 4 proteins to have 5 TMSs, MEMSAT predicted 7 proteins to have 4 TMSs, and SPOCTOPUS predicted 13 proteins to have 3 TMSs. In the cases of the erroneous predictions, the programs frequently counted the P-loop and another minor peak that may or may not be TMSs. Four narrow TMSs in the middle of the sequence and a lone TMS at the N-terminus are displayed using the WHAT program for several of these proteins. One protein has the opposite arrangement: the lone TMS is at the C-terminus, while the 4 other putative TMSs are located at a position similar to that of the other proteins. In this case, HMMTOP proved most reliable, followed by MEMSAT and SPOCTOPUS in that order.

Members of the Small Conductance Mechanosensitive Ion Channel (MscS) Family (1.A.23) comprise a group of topologically diverse proteins with a well-characterized function: osmotic adaptation. These proteins are predicted to have 2, 4, 5, 8, 10, 11, 12 and 13 TMSs. The X-ray crystal structure of an E. coli MscS allowed prediction of the types of motions these proteins undergo [Bass et al., 2002; Wang et al., 2008]. The structure also provides a framework to address the mechanism of tension sensing that is defined by channel-lipid interactions.

The Urea/Amide Channel (UAC) Family (1.A.29) has 4 members with 6 putative TMSs using HMMTOP and SPOCTOPUS, and 2 members with 6 putative TMSs using MEMSAT. These proteins exhibit 6 broad peaks of hydrophobicity corresponding to the 6 predicted TMSs. They are encoded within operons that also encode ureases or amidases in bacteria.

7 TMS Channels

The Transient Receptor Potential Ca2+ Channel (TRP-CC) Family (1.A.4) within the VIC Superfamily includes 11 proteins predicted to have 7 TMSs near the C-termini using HMMTOP, 9 using MEMSAT, and 4 using SPOCTOPUS. TRP channels comprise distinct categories of cation channels that are either highly permeable to Ca2+, non-selective, or Ca2+ impermeable.Of these proteins, all probably have 6 TMSs; the 7 TMS prediction results because the P-loop is often considered transmembrane.

The Polycystin Cation Channel (PCC) Family (1.A.5), still another member of the VIC Superfamily, has many predicted topologies, but a 6 or 7 TMS topology is probably correct. Polycystin 1 contains 16 polycystic kidney disease l (PKD) domains, one LDL-receptor class A domain and one C-type lectin family domain [Gallagher et al., 2010]. These proteins exhibit 1 TMS at their N-termini with the rest at the C-termini.

The Homotrimeric Cation Channel (TRIC) Family (1.A.62)mediates efficient Ca2+ mobilization from intracellular stores through Ca2+ release channels. They present a topology with an intragenic duplication of a three-TMS polypeptide-encoding genetic element followed by a seventh TMS at their C termini [Silverio and Saier, 2011].

9 TMS Channels

Members of the Calcium-Dependent Chloride Channel (Ca-ClC) Family (1.A.17) areimportant for the survival of animals. These channels are required for normal electrolyte and fluid secretion, olfactory perception, and neuronal and smooth muscle excitability in animals [Yang et al., 2008]. They generally have 9 TMSs; an 8 TMS prediction is probably incorrect.

All members of the Presenilin Endoplasmic Reticular Ca2+ Leak Channel (Presenilin) Family (1.A.54), accountable for about 40% of familial Alzheimer's disease cases, are predicted to have 9 TMSs [Tu et al., 2006] using HMMTOP. MEMSAT predicts 5/7 members to have 9 TMSs, and SPOCTOPUS predicts none of the members correctly. All of these proteins have a 9 TMS topology [Laudon et al., 2005]. They resulted from a 3 TMS triplication. A hydrophilic domain follows the first 6 TMSs, and then 3 more TMSs follow. A distant member, signal peptide peptidase-2A, was predicted to have 8 TMSs, but the HMMTOP program missed an N-terminal TMS. The order of accuracy of the three programs was HMMTOP > MEMSAT > SPOCTOPUS.

11 and 12 TMS Channels

Ammonia Channel Transporter (Amt) Family (1.A.11) membershave dual functions, transporting NH3 or NH4+ and regulating nitrogen metabolism by directly interacting with regulatory proteins such as the E. coli PII protein and its homologue, GlnK. They are sometimes thought of as gas channels with two structurally similar halves that span the membrane with opposite polarity [Khademi et al., 2004]. HMMTOP predicts 17/28 members to have 11 TMSs, and 11/28 members to have 12 TMSs. MEMSAT predicts 21/28 members to have 11 TMSs and 7/28 members to have 12 TMSs. Finally, SPOCTOPUS predicts 20/28 proteins to have 11 TMSs and 8/28 proteins to have 12 TMSs. The 11 TMSs (M1-M11) of the E. coli and archaeal AmtB proteins for which X-ray crystal structures are available form a right-handed helical bundle surrounding each channel [Andrade et al., 2005; Khademi et al., 2004]. Probably all members of this family have 11 TMSs.

18-25 TMS Channels

The only protein outside of the VIC Superfamily that was predicted to have 20 TMSs is the Kidney Vasopressin Regulated Urea Transporter (1.A.28.1.1) in the Urea Transporter (UT) Family (1.A.28). Most of the UT proteins vary in size from 380 to 400 residues and exhibit 10 putative TMSs, but mammalian urea transporters such as UT-A1 of the rat are 920-930 residues long and exhibit an internal duplication yielding a total of 20 TMSs.

Many Ca2+ and Na+ channels of the VIC Superfamily (1.A.1) have 24 TMSs due to quadruplication of a 6 TMS unit, but a few Ca2+ and Na+ channels have 12 TMSs due to duplication. The HMMTOP program mispredicted many of these protein topologies [Nelson et al., 1999]. MEMSAT afforded the best accuracy with these proteins, correctly predicting 8 to have 24 TMSs. SPOCTOPUS predicted only 2 proteins to have 24 TMSs. Errors are generally due to overpredictions; for instance, SPOCTOPUS predicted 13 proteins to have more than 25 TMSs, and MEMSAT predicted a total of 10 proteins to have more than 25 TMSs.

>25 TMS Channels

Only one family of channel proteins included members with >25 TMSs. This is the Mechanical Nociceptor Piezo Family (1.A.75). These proteins are believed to be cation-selective channels that mediate responses to noxious mechanical stimuli [Coste et al., 2010; Kim et al., 2012]. The proteins were predicted to have 30, 37, 39, 41 and 43 TMSs with HMMTOP, 26, 35, 37, 38, 39, and 40 TMSs using MEMSAT, and 21, 30, 31, 33 and 37 TMSs using SPOCTOPUS. Examination of the largest of these (1.A.75.1.6) revealed that this protein consists of several domains, possibly an internally repeated sequence, each exhibiting about 7 putative TMSs. At the C-termini of these proteins, DUF3595 domains were identified. These proteins can be found in a wide range of eukaryotes including plants, animals, protozoans, slime molds and ciliates, but not in prokaryotes. It is likely that of all the programs were poorly predictive for members of this family, but HMMTOP may have predicted most accurately.

Holins (TC Subclass 1.E)

Subclass 1.E includes 53 families of putative Holin proteins (table 4). This subclass was analyzed collectively for topological types with the TMStats program without auxiliary proteins on 5/29/2013. A total of 323 proteins were analyzed. The average topology as determined by HMMTOP was 2.2 ± 1.0 SD while the average topologies calculated by MEMSAT and SPOCTOPUS were 1.9 ± 1.0 SD and 1.9 ± 1.1 SD, respectively.

Table 4

Topological distribution of holins in TCDB according to family

Topological distribution of holins in TCDB according to family
Topological distribution of holins in TCDB according to family

The distribution was revealed in a plot of predicted topological types with the frequency of occurrence on the Y-axis and the number of TMSs on the X-axis (fig. 3). Interestingly, MEMSAT predicted more proteins with 1 and 2 TMSs, and fewer proteins with 3 and 4 TMSs compared to the other two programs. More proteins were predicted to have 1 TMS by MEMSAT and SPOCTOPUS, but more proteins appeared to have 3 or 4 TMSs when HMMTOP was used. Overall the order of topological types was 1 TMS > 2 TMSs > 3 TMSs > 4 TMSs. Members of 4 or 5 families (depending on the prediction algorithm used) had 4 TMSs (table 4). There are probably no holins with 5 or more TMSs.

Fig. 3

Comparative distribution of topological types of holins predicted using the TMStats program for HMMTOP in black, MEMSAT in white and SPOCTOPUS in grey. These proteins were included in subclass 1.E of TCDB as of 5/29/2013.

Fig. 3

Comparative distribution of topological types of holins predicted using the TMStats program for HMMTOP in black, MEMSAT in white and SPOCTOPUS in grey. These proteins were included in subclass 1.E of TCDB as of 5/29/2013.

Close modal

1 TMS Holins

The proteins with 1 TMS were analyzed first. Several families were predicted to contain proteins with 1 TMS (table 4). The first of them is the T4 Holin Family (1.E.8). T4 holin is hydrophilic with 49 acidic and basic residues that promote its function as a holin-endolysin system for cell lysis. The lone TMS resides near the N-terminus. The T7 Holin Family (1.E.6) similarly exhibits a 1 TMS topology, as does the φAdh Holin Family (1.E.12).

The BlyA Holin Family (1.E.17) also exhibits a 1 TMS topology. BlyA and the BlyB soluble accessory protein are encoded on the conserved cp32 plasmid of Borrelia burgdorferi.BlyA can promote endolysin-dependent lysis of an induced lambda lysogen that is defective for the lambda holin S gene [Damman et al., 2000]. The Pseudomonas aeruginosa Hol Holin Family (1.E.20) has 1-2 TMSs. Hol by itself, in a broad host-range expression vector under IPTG control, exhibits strong lytic activity, but expression of both Hol and Lys together induces lysis under conditions where neither one alone is effective [Nakayama et al., 2000].

2 TMS Holins

The P21 Holin S Family (1.E.1) has 2 TMSs with both the N- and C-termini on the cytoplasmic side of the inner membrane of E. coli. It functions in the export of an endolysin, but the holin channel also allows release of small ions and metabolites, thereby promoting cell death. The HP1 Holin Family (1.E.7) includes members that aid in the release of lysozymes to the peptidoglycan wall. They have 2 broad hydrophobic peaks and a positively charged C-terminus within the short sequence.

The T4 Immunity Holin Family (1.E.9) is best known for its function in blocking DNA entry into the bacterial cytoplasm [Labrie et al., 2010]. Although T4 Holins usually have 2 TMSs, 1 family member has 3. TMSs 1 and 2 are homologous to the 2 TMSs of other family members. Members of the Bacillus subtilis φ29 Holin Family (1.E.10) with 2 broad hydrophobic peaks aid in cell lysis. φ11 Holin Family (1.E.11) members are hydrophobic peptides with 2 TMSs that similarly exhibit inner membrane disruptive activity. The 2 narrow peaks of hydrophobicity, corresponding to TMSs, are found near the N-termini.

The Lactococcus lactisPhage r1t Holin (1.E.18) has 2 TMSs separated by a short β-turn region. The r1t genome includes two adjacent genes, Orf48 and Orf49, encoding a holin and a lysin. The Bacterophage Dp-1 Holin (1.E.24) is encoded with a lytic phage enzyme that shows an operon organization similar to those of Streptococcus pneumonia and its bacteriophage [Sheehan et al., 1997]. The φU53 Holin (1.E.13) and the ArpQ Holin (1.E.15) both exhibit 2 TMS topologies.

3 TMS Holins

Phage Lambda Holin S (1.E.2) has a 3 TMS topology with the N-terminus in the periplasm and the C-terminus in the cytoplasm. Two products of the same gene have opposite functions: pore formation (S105), and blockage of pore formation (S107). They have 3 evenly spaced TMSs [Graschopf and Blasi, 1999], and the single pore formed has a large diameter [Savva et al., 2008]. The ratio of these two gene products determines the timing of cell lysis. Holin S is expressed at a specific time after phage infection terminates.

Another 3 TMS family includes the PRD1 Phage P35 Holin (1.E.5), an element of the P35 holin-endolysin system. The P35 holin has 3 TMSs with charged residues in the loop regions [Rydman and Bamford, 2003]. Members of the Listeria Phage A118 Holin Family (1.E.21) exhibit a 3 TMS topology with broad peaks of hydropathy evenly spaced. Hol118 appears in the cytoplasmic membrane shortly after infection. A second shorter translation product, which like the Lambda phage S105 protein, has a different translational start site at position 40, lacks the first TMS and inhibits pore formation [Vukov et al., 2003].

The Bacillus Spore Morphogenesis and Germination Holin Family (1.E.23) also has members exhibiting a 3 TMS topology. Involved with spore morphogenesis and germination, its absence results in spores lacking the usual striatal pattern, and the outer coat fails to attach to the underlying inner coat [Real et al., 2005]. Other families that include members exhibiting a 3 TMS topology are the P2 Holin TM Family (1.E.3), the LydA Holin Family (1.E.4), and the Cph1 Holin (Cph1 Holin) Family (1.E.16).

4 TMS Holins

Four or 5 families have holins that appear to contain 4 TMSs according to HMMTOP, MEMSAT and SPOCTOPUS. The most prevalent of these is the LrgA Holin Family (1.E.14). LrgA is a murein hydrolase exporter, and homologues are present in large numbers of bacteria (both Gram-positive and Gram-negative) as well as archaea. These proteins function in programmed cell death that is analogous to apoptosis in eukaryotes [Bayles, 2003]. The 4 TMSs arose by duplication of a 2 TMS precursor.

The Clostridium difficile TcdE Holin (1.E.19) is a 4 TMS protein. This organism produces two large toxins, both encoded within a pathogenicity locus; the tcdE gene is sandwiched in between the two toxin genes [Tan et al., 2001]. Both toxins may be released via tcdE. This action can lead to death to E. colicells.

Pore-Forming Toxins (TC Subclass 1.C)

411 proteins in TCDB were listed as pore-forming toxins in subclass 1.C as of 5/29/2013. The average number of putative TMSs for these proteins using HMMTOP is 0.68 ± 0.82, is 1.0 ± 0.56 using MEMSAT, and is 0.63 ± 0.68 using SPOCTOPUS. With HMMTOP, 205 proteins were predicted to have 0 TMSs, 151 were predicted to have 1 TMS, 42 to have 2, 9 to have 3, and 4 to have 4. Using MEMSAT, 17 were predicted to have 0 TMSs, 357 were predicted to have 1 TMS, 25 were predicted to have 2 TMSs and 9 were predicted to have 3 TMSs. Finally, with SPOCTOPUS, 193 proteins were predicted to have 0 TMSs, 180 proteins were predicted to have 1 TMS, 29 proteins were predicted to have 2 TMSs, and 6 proteins were predicted to have 3 TMSs (see fig. 4). The proteins predicted to have 4 TMSs are in the two-component Bacterial Type III-Target Cell Pore (III TPC) Family (1.C.36). As noted above, MEMSAT does not give reliable results for 0, 1 and 2 TMS proteins.

Fig. 4

Comparative distribution of topological types predicted using the TMStats program for HMMTOP in black, MEMSAT in white and SPOCTOPUS in grey, for the proteins included in subclass 1.C of TCDB as of 5/29/2013.

Fig. 4

Comparative distribution of topological types predicted using the TMStats program for HMMTOP in black, MEMSAT in white and SPOCTOPUS in grey, for the proteins included in subclass 1.C of TCDB as of 5/29/2013.

Close modal

0 TMS Toxins

Examination of the proteins predicted to have 0 TMSs revealed that many of their hydropathy profiles displayed substantial peaks of hydrophobicity. For example, all members ofthe Channel-Forming δ-Endotoxin Insecticidal Crystal Protein (ICP) Family (1.C.2) exhibit 2 striking peaks of hydrophobicity near their N-termini while the remainder of these proteins are hydrophilic. Members of the α-Hemolysin Channel-Forming Toxin (αHL) Family (1.C.3) exhibit a single N-terminal peak of hydrophobicity, possibly representing the signal sequence for export via the general secretory pathway (3.A.5). Members of the Aerolysin Channel-Forming Toxin Family (1.C.4) lack hydrophobic peaks of sufficient magnitude to pass through the membrane as α-helices. Members of the Botulinum and Tetanus Toxin (BTT) Family (1.C.8) exhibit only one hydrophobic peak centrally located in these polypeptide chains. Members of the Pore-Forming RTX Toxin Family (1.C.11) are predicted to have 0, 1 or 2 TMSs based on hydropathy plots. However, all members of this family exhibit three hydrophobic peaks in their central domains, the first being the smallest and the last one being the largest.

RTX toxins exhibit tremendously varied sizes, ranging from 300 residues to about 3,000 residues. The same was observed for the Clostridial Cytotoxin (CCT) Family (1.C.57), which is also a member of the RTX superfamily. Members of the small peptide Magainin (Magainin) Family (1.C.16) were predicted to have either 1 or 0 TMSs, but all of these small proteins exhibit an N-terminal signal sequence, specifying export via the general secretory pathway (3.A.5). These examples confirm that the TMStats program, based on HMMTOP, MEMSAT and SPOCTOPUS, provides approximate values in predicting TMSs but cannot be considered to be highly accurate. Every family must be considered separately, as some of these programs are more reliable for some families while others are more reliable for other families. Using the AveHAS program, predictions can be much more accurately verified.

1 TMS Toxins

Most members of the Channel-Forming Colicin Family (1.C.1) are predicted to have a single TMS. These proteins exhibit a single broad hydrophobic peak at their extreme C-termini, but in some cases these peaks split into two predicted TMSs. Most members of the Channel-Forming ε-Toxin Family (1.C.5) exhibit a single N-terminal hydrophobic peak, undoubtedly corresponding to the export signal sequence. Similarly, all members of the Thiol-Activated Cholesterol-Dependent Cytolysin (CDC) Family (1.C.12) exhibit a single N-terminal signal TMSs. Again, when members of the Membrane Attack Complex/Perforin (MACPF) Family (1.C.39) were examined, a single N-terminal peak of hydrophobicity was observed. These 2 families belong to a single superfamily and are therefore homologous. Although they are from prokaryotes and eukaryotes, respectively, it would appear that both are secreted via the general secretory (Sec) pathway [Saier et al., 2008]. Further examination of proteins predicted to exhibit a single TMS showed that the majority of these occur at the extreme N-termini of the proteins. Most of these toxins are secreted to the external medium whereupon they undergo massive conformational changes when they insert into the membranes of their target cells.

2 TMS Toxins

Among the proteins that were predicted to have 2 TMSs were members of the Pore-Forming Haemolysin E (HlyE) Family (1.C.10). These family members are about 300 residues in length. In this family we find proteins with 2 peaks of hydrophobicity separated by about 100 residues. Although all members of this family exhibit 2 peaks of hydrophobicity, the programs in use do not always predict them to be transmembrane.

The Cecropin (Cecropin) Family (1.C.17) and the Melittin (Melittin) Family (1.C.18) contain members that are predicted to have 1 or 2 TMSs. However, all members show 2 hydrophobic peaks, the first being the targeting signal sequence, and the second being the single TMS in the mature protein that comprises the oligomeric channel [Bechinger, 1997]. The Pediocin Family (1.C.24), the Lactacin X Family (1.C.26), the Divergicin A Family (1.C.27) and the Bacteriocin AS-48 Cyclic Polypeptide Family (1.C.28), all members being bacteriocins, exhibit similar characteristics with 2 putative TMSs. The Cecropin and Melittin superfamilies have recently been shown to include proteins that are homologous to each other [A.J. Le and M.H. Saier, unpubl. results].

Toxins with >2 TMS

The Bacterial Type III-Target Cell Pore (IIITCP) Family (1.C.36) includes members that exhibit from 0 to 4 predicted TMSs. These systems consist of two non-homologous proteins, one predicted to have 0 or 1 TMS, while the other is predicted to have 2-4 TMSs. These proteins insert into the membrane of the target animal or plant cell to facilitate injection of bacterial proteins into the eukaryotic cells via a type III protein secretion system (injectisome). Most of the larger proteins exhibit 2 striking centrally localized peaks of hydrophobicity, the first broad, probably encompassing 2 TMSs, and the second sharp, almost always predicted to be a single TMS. We suggest that these toxins have 3 TMSs. While the IpaB protein (1.C.36.3.1) is likely to have 3 TMSs, another, BopB (1.C.36.4.1), is homologous to other members of this family except that it has 2 additional hydrophobic peaks C-terminal to the usual 3 TMSs, common to all members of this family.

Porters (Uniporters, Symporters, and Antiporters; TC Subclass 2.A)

A histogram of predicted topologies generated using the HMMTOP prediction algorithm for all proteins included in TC subclass 2.A revealed that of the 2,582 proteins, the average size was 10.5 ± 2.9 SD. With the MEMSAT and SPOCTOPUS algorithms, the means and standard deviations of predicted topologies were 10.5 ± 2.7 and 10.0 ± 2.8 SD, respectively. The largest numbers of proteins, 953, 968, and 765 for the three programs, respectively, contain 12 putative TMSs; however, proteins exhibiting 10-14 TMSs were prevalent (fig. 5) regardless of the prediction algorithm used. Of the proteins of smaller sizes, there is a peak of proteins exhibiting 6 TMSs, but substantial numbers of proteins display 7 through 9 TMSs with smaller numbers having 4 and 5 putative TMSs when SPOCTOPUS and MEMSAT were used; HMMTOP exhibited the opposite behavior, with greater numbers of proteins displaying 4 and 5 TMSs, and smaller numbers of proteins displaying 7 through 9 TMSs. These proteins were analyzed further.

Fig. 5

Comparative distribution of topological types predicted using the TMStats program for HMMTOP in black, MEMSAT in white and SPOCTOPUS in grey, for the proteins included in subclass 2.A of TCDB (secondary carriers) as of 5/29/2013.

Fig. 5

Comparative distribution of topological types predicted using the TMStats program for HMMTOP in black, MEMSAT in white and SPOCTOPUS in grey, for the proteins included in subclass 2.A of TCDB (secondary carriers) as of 5/29/2013.

Close modal

1 or 2 TMS Porters

Most members of the Mitochondrial Inner Membrane K+/H+ and Ca2+/H+ Exchanger (LetM1) Family (2.A.97) are predicted to have 2 TMSs by HMMTOP; however, MEMSAT and SPOCTOPUS predict all 4 members of LetM1 to have 1 TMS. These proteins exhibit hydrophobic peaks near their N-termini, but in cases of members of human origin, they only display 1 TMS. These topological features are typical of channels, and this family is the only family of carriers reported to have fewer than 3 TMSs [Jiang et al., 2009]. In our opinion, the claim that these proteins function as carriers should be further investigated.

3 TMS Porters

Putative carriers predicted to have 3 TMSs by HMMTOP include members of the Mitochondrial tRNA Import Complex (M-RIC) Family (2.A.91) [Basu et al., 2008], the Bilirubin Transporter (BRT) Family (2.A.65) [Passamonti et al., 2005], and the Mitochondrial Pyruvate Carrier (MPC) Family (2.A.105) [Herzig et al., 2012]. MEMSAT and SPOCTOPUS predict members of all of these families to have 0 and 2 TMSs, respectively. The M-RIC Family contains a protein with over 600 residues that displays 3 probable TMSs at its N-terminus. The BRT family consists of a single functionally characterized protein in TCDB, the bilitranslocase, which exhibits 1 N-terminal TMS and two C-terminal TMSs. This protein has no homologues in the NCBI protein database, making its identity questionable. These proteins should be re-examined for their channel versus carrier properties, as there are very few putative carriers that have been reported to have just 3 TMSs.

The Mitochondrial Pyruvate Carrier (MPC) Family (2.A.105) contains 7 proteins that each includes 3 putative TMSs [Herzig et al., 2012]. HMMTOP correctly predicted six 3 TMS proteins and one 4 TMS protein, while SPOCTOPUS and MEMSAT predicted 0 and 1 proteins to have 3 TMSs, respectively. SPOCTOPUS predicted two 0 TMS proteins, four 1 TMS proteins and one 2 TMS protein while MEMSAT predicted one 1 TMS protein, five 2 TMS proteins, and one 3 TMS protein. Since these proteins are known to have 3 TMSs, this is an example where HMMTOP proves most reliable of the three programs.

4 or 5 TMS Porters

In addition to the superfamilies described in more detail below, several families include members that were predicted to have 4 and 5 TMSs. The Cytochrome Oxidase Biogenesis (Oxa1) Family (2.A.9) consists of 9 proteins in TCDB. Using HMMTOP, one member was predicted to have 3 TMSs, 3 members were predicted to have 4 TMSs, 3 members were predicted to have 5 TMSs, as has been established experimentally for representative members [Sato and Mihara, 2009], and 2 members were predicted to have 6 TMSs. MEMSAT predicted 1 protein in this family to have 5 TMSs, 6 to have 6 TMSs, and 2 to have 7 TMSs. SPOCTOPUS predicted 1 protein to have 2 TMSs, and 2 proteins to each have 3, 4, 5 and 6 TMSs. Thus, HMMTOP performed best with this family, although no program proved particularly reliable for all members.

The 4 TMS Multidrug Endosomal Transporter (MET) Family (2.A.74) includes 5 proteins in TCDB, 1 predicted to have 3 TMSs, 2 predicted to have 4 TMSs, and 2 predicted to have 5 TMSs using HMMTOP. MEMSAT predicts 3 proteins to have 4 TMSs and 2 proteins to have 5 TMSs. SPOCTOPUS correctly predicts 4 proteins to have 4 TMSs, and only 1 protein to have 5 TMSs.

The Threonine/Serine Exporter (ThrE) Family (2.A.79) includes a protein (2.A.79.2.1) that is predicted to have 4 TMSs by all three programs, but examination of the WHAT plot, based on HMMTOP, suggests 5. These homologues can exist as full-length 10 TMS proteins or half-sized 5 TMS proteins. The Vitamin Uptake Transporter (VUT or ECF) Family (2.A.88) includes 4 members predicted to have 4 TMSs by HMMTOP, 0 by MEMSAT, and 3 by SPOCTOPUS. Most members of this family exhibit 5 or 6 TMSs as predicted by all three programs.

The Cation Diffusion Facilitator (CDF) Family (2.A.4) consists of 34 proteins in TCDB. 15 proteins of this family were predicted to have 5 TMSs by HMMTOP, 2 by MEMSAT and 9 by SPOCTOPUS. However, many of these proteins are known to have 6 TMSs with three 2-TMS repeats [Matias et al., 2010]. Analysis of proteins predicted to have 5 TMSs revealed that every one of these proteins exhibits 6 hydrophobic peaks, one of which was missed by the various programs. HMMTOP correctly predicted 12 proteins to have 6 TMSs, MEMSAT correctly predicted 29 proteins to have 6 TMSs, and SPOCTOPUS correctly predicted 20 proteins to have 6 TMSs. A single member of the Tellurite-Resistance/Dicarboxylate Transporter (TDT) Family (2.A.16) exhibits 5 TMSs with a long hydrophilic C-terminus by HMMTOP. Eleven of the remaining members of the family have 10 TMSs with two 5-TMS repeats as displayed by HMMTOP. In comparison, MEMSAT predicts 9 proteins to have 10 TMSs, and SPOCTOPUS predicts only 3 proteins to have 10 TMSs. The ATP-dependent subtelomeric helicase, RecQ of Schizosaccharomycespombe(2.A.16.2.2; 2,100 amino acids), has a 5 TMS N-terminal domain (residues 43-210). It is 94% identical to 2.A.16.2.1, a malate transporter of the same species. It is not known whether RecQ catalyzes transport, but this close similarity certainly suggests a transport function. The Aromatic Acid Exporter (ArAE) Family (2.A.85) includes members of varying predicted topologies, the most prevalent being 5 and 10 TMSs as predicted by HMMTOP, 5 and 10 TMSs by MEMSAT, and 6 and 8 TMSs by SPOCTOPUS. Other families in which 5 TMS members were identified were in agreement with previously known or predicted topologies.

6 or 7 TMS Porters

Many proteins were found to exhibit 6 TMSs, and 6 TMSs is a common topology for transporters. Most of the proteins predicted to have 6 TMSs belong to families known to consist of 6 TMS members, but a few consist of 7 TMS proteins where 1 TMS was missed by the prediction programs. A few proteins were identified that belong to families with most members exhibiting more TMSs, but in these cases the TC entries were shown to be truncated sequences and the erroneous TC entries were replaced in TCDB by the correct sequences. The same considerations proved to be true for the 7 TMS proteins. As for the 6 TMS proteins, several of the proteins predicted to have 7 TMSs belong to families that have been discussed above.

8 TMS Porters

Proteins with 8 TMSs were much less numerous than 6 TMS proteins, but there were more 8 than 7 TMS proteins, regardless of the prediction program used, indicating that the 8 TMSs topology is common among transporters. One such family is the K+ Transporter (Trk) Family (2.A.38) where an 8 TMS topology has been established for 3 members of the family [Kato et al., 2001; Zeng et al., 2004] as discussed above.

Porters with ≥ 10 TMSs

The following superfamilies with members having multiple TMSs were studied in greater detail:

- The Major Facilitator (MFS) Superfamily (2.A.1 plus others; see ‘Superfamily' link in TCDB)

- The Amino Acid-Polyamine-Organocation (APC) Superfamily (2.A.3 plus others)

- The Resistance-Nodulation-Cell Division (RND) Superfamily (2.A.6)

- The Drug/Metabolite Transporter (DMT) Superfamily (2.A.7)

- The Mitochondrial Carrier (MC) Superfamily (2.A.29)

- The Multidrug/Oligosaccharidyl-lipid/Polysaccharide (MOP) Flippase Superfamily (2.A.66).

The Major Facilitator Superfamily (MFS; 2.A.1)

TMStats seldom showed anomalous predictions forthe Major Facilitator Superfamily (MFS; 2.A.1), as noted for the Sugar Porter Family discussed above, regardless of the program used. The MFS includes 652 proteins in TCDB as of 5/29/2013, and according to the TM distribution histogram, it showed an average topology of 12.2 ± 1.1 TMSs using HMMTOP, 12.0 ± 1.2 TMSs with MEMSAT, and 11.4 ± 1.7 TMSs with SPOCTOPUS.

474 (72%) MFS carriers were predicted to have 12 TMSs while 63 (10%) were predicted to have 11 and another 57 (9%) were predicted to have 14 TMSs (fig. 5) using HMMTOP. MEMSAT predicted 447 (69%) to have 12 TMSs, 76 (12%) to have 11 TMSs, and 69 (11%) to have 14 TMSs. SPOCTOPUS predicted only 330 (51%) to have 12 TMSs, 81 (13%) to have 11 TMSs, and 54 (8%) to have 14 TMSs. Examination revealed that most proteins predicted to have 11 TMSs actually have 12, but the programs missed 1 of them. Proteins predicted to have 10 TMSs also appeared to have 12 TMSs where the programs missed 2. Those predicted to have 13 TMSs probably had either 12 or 14 TMSs. Most of the MFS permeases predicted to have 14 proved to have 2 extra TMSs separating the two 6 TMS repeat units [Paulsen et al., 1996]. Only one protein each with 8, 17, 18 and 24 TMSs was detected with HMMTOP, and one protein each with 8, 18, 18, and 24 TMSs detected with MEMSAT. SPOCTOPUS detected 20 proteins with 8 TMSs, 1 protein with 17 TMSs, and 1 protein with 24 TMSs. Very few proteins were predicted to have 5, 6 or 7 TMSs by any of the programs, suggesting that few, if any, half-sized (6 TMS) MFS proteins are included in TCDB. However, a family of lysyl tRNA synthetases (9.B.111) includes 5 or 6 TMS N-terminal sequences that are clearly related to the second halves of MFS carriers of the DHA2 family (2.A.1.3) of the MFS. The proteins predicted to have 8 TMSs could be interpreted as 12 TMS proteins. In these proteins, 2 TMSs within each of four hairpin structures were so close together that the programs predicted the structure to be a single TMS rather than 2 TMSs in each hairpin.

Three proteins were predicted to have 16 TMSs by HMMTOP, one by MEMSAT, and zero by SPOCTOPUS; the proteins predicted to have 16 TMSs by HMMTOP are located within the DHA2 family (2.A.1.3), most members of which are known to have 14 TMSs. Two small peaks of hydrophobicity in the C-terminal region were predicted to be TMSs in these proteins but not in other members of this family. Their actual topology is most likely to be 14 TMSs, but this must be determined experimentally. The single protein predicted to have 24 TMSs by HMMTOP MEMSAT but 22 TMSs by SPOCTUPUS belongs to the Nitrate:Nitrite Porter (NNP) Family (2.A.1.8.11) and has 24 established TMSs. This protein, NarK, consists of two fused MFS permeases exhibiting two distinct but related functions. The first is a nitrate:proton symporter, and the second is a nitrate:nitrite antiporter [Goddard et al., 2008]. Proteins predicted to have 17 and 18 TMSs prove to be fusion proteins where in one case, the fusion was to a 5 TMS sensor kinase domain, and in the other case, the MFS porter was fused to a 6 TMS YedZ domain [von Rozycki et al., 2004]. Another protein, 2.A.1.65.11, predicted to have 20 TMSs by TOPCONS, 16 TMSs by MEMSAT, and 10 TMSs by SPOCTUPUS belongs to the Unidentified Major Facilitator (UMF)-14 Family has 24 putative TMSs due to a full intragenic duplication event.

The Amino Acid-Polyamine-Organocation (APC) Family (2.A.3)

The Amino Acid-Polyamine-Organocation (APC) Family within the APC Superfamily (2.A.3) was studied in some detail because of the anomalous behavior exhibited by some of its members. The superfamily included 134 proteins in TCDB with an average number of TMSs equal to 12.2 ± 1.0 TMSs according HMMTOP, 12.3 ± 1.2 TMSs according to MEMSAT, and 11.9 ± 1.2 TMSs according to SPOCTOPUS. 95 (71%) of the proteins were predicted to have 12 TMSs with HMMTOP, 87 (65%) with MEMSAT, and 95 (71%) with SPOCTOPUS. Six (4.5%) were predicted to have 10 TMSs with HMMTOP, 1 (0.8%) with MEMSAT, and 7 (5.2%) with SPOCTOPUS; all of these proteins proved to be homologous to the proteins containing 12 TMSs throughout most of their lengths. However, they differ from the 12 TMS proteins in that they lack approximately 100 residues containing the two C-terminal TMSs. Out of the proteins predicted to have 10 TMSs, HMMTOP correctly predicted 4 out of 6 to come from the Spore Germination Protein (SGP) Family (2.A.3.9), known to have 10 TMSs. MEMSAT incorrectly predicted the topologies of all SGP proteins, and SPOCTOPUS only predicted 1 SGP protein correctly. These proteins have lost their transport function and are apparently amino acid receptors providing signaling functions for spore germination. In this case, HMMTOP proved superior to MEMSAT and SPOCTOPUS.

Seven proteins (5.2%) were predicted to have 11 TMSs with HMMTOP, 9 with MEMSAT, and 6 with SPOCTOPUS. Examination revealed that these proteins actually possess 12 hydrophobic peaks, corresponding to predicted TMSs. In each of the putative 11 TMS proteins, all three programs missed a single hydrophobic peak, most frequently the TMS at their extreme N-termini. The 9 proteins predicted to have 13 TMSs by HMMTOP, 17 proteins by MEMSAT, and 4 proteins (3.0%) by SPOCTOPUS actually have 12 hydrophobic peaks, but the programs predicted an extra TMS in each case. The 14 proteins predicted to have 14 TMSs and 1 protein predicted to have 15 TMSs by HMMTOP, 2 proteins to have 14 TMSs and 12 proteins to have 15 TMSs by MEMSAT and 15 proteins to have 14 predicted TMSs by SPOCTOPUS are homologues of the proteins displaying 12 TMSs with extensions at their C-termini that contain the extra 2 (and in a few cases 3) putative TMSs. In this case, HMMTOP provided the most accurate predictions.

Assuming a 12 TMS topology with these few exceptions, only 39 out of 134 proteins (30%) by HMMTOP, 47 out of 134 by MEMSAT (35%), and 39 out of 134 by SPOCTOPUS (30%) were mispredicted. Thus, while HMMTOP and SPOCTOPUS correctly predicted the topologies of 84% of the proteins in the MFS, they predicted 70% of APC family members correctly; MEMSAT predicted 65% correctly. However, the situation is more complex. The APC Superfamily topological analyses are of particular interest because the high-resolution X-ray structures of several members of this superfamily have been solved [Gao et al., 2010; Lu et al., 2011]. In all cases, these proteins consist of 2 repeat units of 5 TMSs with two extra TMSs most frequently at the C-termini of these porters. In contrast to the MFS, the two halves of these proteins have an odd number of TMSs and consequent opposite orientations in the membrane [Reddy et al., 2012].

The Resistance-Nodulation-Cell Division (RND) Superfamily (2.A.6)

The Resistance-Nodulation-Cell Division (RND) Superfamily (2.A.6) includes 97 proteins in TCDB with an average prediction of 11.6 ± 1.5 TMSs according to HMMTOP, 11.8 ± 1.3 TMSs according to MEMSAT, and 11.8 ± 1.3 TMSs according to SPOCTOPUS. 68 (70%) were predicted to have 12 TMSs with a clear repeat unit having a 1 + 5 arrangement using HMMTOP, 76 (78%) were predicted to have 12 TMSs by MEMSAT, and 77 (79%) were predicted to have 12 TMSs by SPOCTOPUS. Five homologues were predicted to have 13 TMSs by HMMTOP and SPOCTOPUS, and 4 homologues were predicted to have 13 TMSs by MEMSAT. They proved to be most similar to the Niemann-Pick C type (NPC) proteins from cellular and acellular slime molds. One of the NPC proteins (2.A.6.6.7) possesses N-terminal domains of about 400 amino acyl residues with 4 extra putative TMSs in a 1 + 3 arrangement to make a total of 16 TMSs. 10 (10%) were predicted to have 11 TMSs by HMMTOP and MEMSAT, and 8 (8.2%) were predicted to have 11 TMSs by SPOCTOPUS. These appeared to have the usual 12 TMSs topology. The prediction algorithms missed 1 TMS in a region where 5 TMSs cluster tightly together.

There were 2 proteins with 6 putative TMSs with HMMTOP, 1 with MEMSAT, and 3 with SPOCTOPUS; they displayed similar placements of hydrophobic regions as that of its counterparts with 12 TMSs. Two of these proteins, the two correctly predicted by HMMTOP, are the SecD and SecF half-sized bacterial proteins of E. coli, which together comprise a full-length transporter of ill-defined function, but involved in protein secretion via the general secretory (Sec) pathway (TC #3.A.5), possibly acting to allow use of the proton motive force to drive ATP-independent protein translocation [Tsukazaki et al., 2011]. Another two proteins predicted to have 7 TMSs by HMMTOP and MEMSAT actually display 12 TMSs, but the two programs missed the last 5 TMSs at their C-termini; SPOCTOPUS did not predict any proteins to have 7 TMSs. Six proteins were predicted to have 9 TMSs with HMMTOP and one protein was predicted to have 9 TMSs with both MEMSAT and SPOCTOPUS. Two proteins displayed 14 TMSs with HMMTOP, while the MEMSAT and SPOCTOPUS programs only predicted one protein to have 14 TMSs. The three programs failed to pick up the second, seventh, and eighth TMSs for the proteins predicted to have 9 TMSs. The two proteins displaying 14 putative TMSs with HMMTOP had 2 moderately hydrophobic peaks near their C-termini.

The Drug/Metabolite Transporter (DMT) Superfamily (2.A.7)

Drug/Metabolite Transporter (DMT) Superfamily (2.A.7) members exhibit variable topologies. The superfamily includes 199 proteins in TCDB with an average size of 8.9 ± 2.0 TMSs according to HMMTOP, 8.7 ± 2.2 TMSs with MEMSAT, and 8.4 ± 1.9 TMSs with SPOCTOPUS. Recent bioinformatic data have led to the conclusion that a 2 TMS-encoding genetic element duplicated to 4 TMSs, added 1 TMS at the N-terminus to give 5 TMSs, and then duplicated to give 10 TMS proteins [Lam et al., 2011]. The pathway was thus: 2 → 4 → 5 → 10 TMSs. In fact, all of these topological types are found among current DMT family members. Of the 199 DMT proteins, 120 (60%) were predicted to have 10 TMSs with HMMTOP, 104 (52%) were predicted to have 10 TMSs by MEMSAT, and only 71 (36%) were predicted to have 10 TMSs by SPOCTOPUS. 6 (3%) appeared to have 5 TMSs, 17 (8.5%) may have 4, and 2 (1%) have 2 TMSs according to HMMTOP. MEMSAT predicted 1 to have 5 TMSs (0.5%), 13 (6.5%) to have 4 TMSs, and 2 (1%) to have 2 TMSs. SPOCTOPUS predicted 5 to have 5 TMSs (2.5%), 18 to have 4 TMSs (9%), and 2 (1%) to have 2 TMSs. The functions of the 2 TMS proteins are not known, but many bacteria have them, so they are not likely to be artifactual.

Nine proteins were predicted to have 3 TMSs with MEMSAT (none were predicted by HMMTOP or SPOCTOPUS), but careful examination revealed that these proteins probably have 4 TMSs; the N-terminal TMSs were repeatedly missed by MEMSAT. Based on WHAT program analyzes, the proteins predicted to have 6, 7, 8 or 9 TMSs also appear to have 10 TMSs. The programs missed certain TMSs throughout these proteins. The proteins predicted to have 11 TMSs probably have 10 TMSs as well. However, 1 protein (2.A.7.11.2) predicted to have 12 TMSs proved to be homologous to the 10 TMS proteins except for an N-terminal extension that introduced 2 extra TMSs. The observations reported revealed that the order of correct predictions for the DMT Superfamily was HMMTOP > SPOCTOPUS > MEMSAT. Thus, while a particular topological prediction program is relatively reliable for some families of transport proteins, it can be less reliable for others, and unreliable for still others.

The Mitochondrial Carrier (MC) Superfamily (2.A.29)

The Mitochondrial Carrier Family (MC; 2.A.29) gave anomalous results when examined with HMMTOP but to a lesser degree when MEMSAT or SPOCTOPUS was used. This family included 129 proteins in TCDB, and according to the TM distribution histogram using HMMTOP, it showed an average of 4.3 ± 1.9 TMSs. MEMSAT predicted an average topology of 5.9 ± 0.5 TMSs, and SPOCTOPUS predicted an average topology of 5.7 ± 1.0 TMSs. Of HMMTOP's results, 8 were predicted to have 0 TMSs; six, 1 TMS; eleven, 2; eight, 3; thirty-four, 4; seventeen, 5; forty, 6, and five, 7. MEMSAT predicted one 2 TMS protein, one 3 TMS proteins, two 4 TMS proteins, one hundred and twenty-three 6 TMS proteins, and two 7 TMS proteins. SPOCTOPUS predicted one 0 TMS protein, one 1 TMS protein, four 2 TMS proteins, one 3 TMS protein, four 5 TMS proteins, and one hundred and eighteen 6 TMS proteins. These proteins are known to consist of 3 repeat units, each having 2 TMSs [Kuan and Saier, 1993]. We were unable to come up with evidence for exceptions. Only 40 of the 129 proteins (31%) were correctly predicted to have 6 TMSs by the HMMTOP program; in comparison, MEMSAT correctly predicted 123/129 (95%) proteins and SPOCTOPUS correctly predicted 118/129 (91.4%) proteins. These proteins were therefore examined in greater detail to understand the reasons for this tremendous discrepancy.

All 7 members of the family predicted to have 0 TMSs by HMMTOP displayed 6 peaks of hydrophobicity with the WHAT program. However, the degrees of hydrophobicity of these peaks were frequently below the threshold for identification of a TMS by HMMTOP, which missed the N-terminal TMS most frequently and the third TMS least frequently. The statistics in terms of percent missed were TMS1 66%, TMS2 48%, TMS3 31%, TMS4 43%, TMS5 41%, and TMS6 41%. Overall, HMMTOP missed 45% of the TMSs for the MC Family.

The Multidrug/Oligosaccharidyl-Lipid/Polysaccharide (MOP) Flippase Superfamily (2.A.66)

At the time when this study was conducted, the Multidrug/Oligosaccharidyl-lipid/Polysaccharide (MOP) Superfamily (2.A.66) included 79 proteins in TCDB with an average size of 12.4 ± 1.1 TMSs according HMMTOP, 12.0 ± 1.2 TMSs according to MEMSAT and 11.3 ± 1.7 TMSs according to SPOCTOPUS. 41 (52%) of the proteins were predicted to have 12 TMSs by HMMTOP, 37 (47%) of the proteins were predicted to have 12 TMSs by MEMSAT, and 34 (43%) of the proteins were predicted to have 12 TMSs by SPOCTOPUS. These proteins exhibit two duplicated halves of 6 TMSs based on bioinformatic and X-ray crystallography studies [He et al., 2010; Hvorup et al., 2003]. Careful examination of the proteins predicted to have 9, 10 or 11 TMSs revealed that all appear to have 12 TMSs; the prediction algorithms missed one or more hydrophobic peaks. The proteins predicted to have 13 or 14 TMSs are homologous to the proteins predicted to have 12 TMSs with extensions at either the N- or C-terminal ends of the sequences with either one or two putative TMSs.

We have compared nine programs to determine their topological prediction accuracies. Initially, we used four representative families where the protein topologies have been well established. While 2 of those families were predicted with reasonably high levels of confidence, the other two were not. For 3 of the 4 families, the order of accuracy was the same: SPOCTOPUS > MEMSAT > HMMTOP > TOPCONS > PHOBIUS > TMHMM > SVMTOP > DAS > SOSUI. The results indicated that a combination of SPOCTOPUS, MEMSAT and HMMTOP were the best performers, and these were used in subsequent studies. All three prediction algorithms were incorporated into the novel TMStats program. Interestingly, as shown in table 5, the order of accuracy observed for these three programs is not the same when analyzing different families or superfamilies in subclass 2.A (carriers). The results show that in many cases, the HMMTOP algorithm outperforms its counterparts in prediction accuracy. Thus, when the MFS (2.A.1), APC (2.A.3), DMT (2.A.7), MOP (2.A.66) and MPC (2.A.105) families were examined, HMMTOP outperformed MEMSAT, and MEMSAT usually outperformed SPOCTOPUS. In fact, while HMMTOP was superior in 5 out of the 9 cases tabulated, SPOCTOPUS was superior in 3 and MEMSAT was superior in only 1. When small families of uniform topology were examined, the reverse trend was sometimes observed. Using the sum totals of the results generated in this study, we propose that MEMSAT and SPOCTOPUS often predict small families of proteins with greatest accuracy, as shown when the Sugar Porter, Mitochondrial Carrier (MC) and Potassium Transporter (Trk) families were examined, but that HMMTOP often excels at predicting larger, topologically heterogeneous superfamilies of proteins. MEMSAT was particularly unreliable in predicting proteins with 0, 1 or 2 TMSs.

Table 5

Prediction algorithm accuracy rank for selected superfamilies in subclass 2.A

Prediction algorithm accuracy rank for selected superfamilies in subclass 2.A
Prediction algorithm accuracy rank for selected superfamilies in subclass 2.A

Two families that consistently resulted in poor predictions with most of the nine programs initially examined were the MC and the Trk families. In the case of the MC Family, erroneous predictions resulted from low hydropathy values for the individual TMSs, but even for the fairly hydrophobic peaks, 7 of the 9 programs often grossly underpredicted these TMSs. Only SPOCTOPUS and MEMSAT predicted these proteins with 90 and 80% accuracy, respectively (tables 1, 5). In the case of the Trk Family, the errors resulted from the prediction that P-loops, that dip into the membrane on one side and exit on the same side, are transmembrane. Even SPOCTOPUS and MEMSAT had low prediction accuracies, predicting only 50 and 35% of proteins correctly, respectively. Thus, different causes for the errors were observed for these 2 families. However, seven of the nine programs examined consistently made these same errors. These observations suggest a systematic problem shared by many topology prediction programs. They should provide impetus for the design of novel improved programs that can more accurately predict transmembrane protein topologies; improvements can be made to different prediction models by training new HMMs or SVMs to better understand and predict TMSs. It may also be beneficial to introduce family-specific programs for structural biologists and bioinformaticians to use, although it would be desirable to incorporate the new knowledge gained into a single, generalized program that could determine the best prediction algorithm to use with a given protein or family of proteins.

Table 6 summarizes the distribution of topological types among three types of channels (TC subclasses 1.A, 1.C and 1.E) as well as carriers (2.A); predictions based on the best three programs are reported. TC subclass 1.A channels include all α-type channels, except the small prokaryotic holins of subclass 1.E. Subclass 1.C includes secreted channel-forming toxins, and subclass 2.A includes all secondary carriers. Topological distributions for the same subclasses are shown in figures 2, 3, 4, 5. For subclass 1.A, 2 TMS channels outnumber putative 1 TMS channels, 4 TMS channels outnumber putative 3 TMS channels, and 6 TMS channels outnumber 5 or 7 TMS channels. However, 8 and 9 TMS channels are present in about equal numbers, while putative 11 TMS channels exceed the numbers of 10 TMS channels. Many proteins that are predicted to contain 11 TMSs are members of the Amt (1.A.11) family. The Amt family has been experimentally determined to have either 11 or 12 TMS proteins, with most members of this family containing 11. HMMTOP predicts 12 of these proteins to have 12 TMSs, which is more than either MEMSAT or SPOCTOPUS, which predict 7 and 8 proteins to have 12 TMSs, respectively. With all three programs, however, the most commonly predicted topology is 11 TMSs, reflecting established experimental data, with SPOCTOPUS and MEMSAT predicting 20 and 21 proteins to have 11 TMSs, respectively, and HMMTOP predicting 17 proteins to have 11 TMSs. For the remainder of the graph, the situation where channels of even numbers of TMSs generally outnumber those of odd-numbered TMSs reoccurs. Thus, there are more 16 TMS channels than 17 TMS channels, more 18 TMS proteins than 19, and more 20 TMS proteins than 21 (see table 6, fig. 2). This observation, in agreement with previously published data [Saier, 2003], confirms the postulate that channels with larger numbers of TMSs arose as a result of intragenic duplication events, and also confirms the presence of repeat sequences in many of these channel proteins.

Table 6

Distribution of topological types among analyzed TC subclasses using three prediction programs

Distribution of topological types among analyzed TC subclasses using three prediction programs
Distribution of topological types among analyzed TC subclasses using three prediction programs

The situation for channel-forming toxins is strikingly different. Table 6 depicts the average percentages and numbers of topological predictions obtained with the top three programs. MEMSAT predicted a significantly larger number of 1 TMS toxins than either of its counterparts; 357 compared to 151 for HMMTOP and 180 for SPOCTOPUS. Conversely, a substantially lower number of 0 TMS predictions were made by MEMSAT, predicting only 17 out of 408 proteins. This is in stark contrast to HMMTOP and SPOCTOPUS, which predicted 205 and 193 proteins, respectively, to have 0 TMSs. Upon examination of these results, it became clear that MEMSAT consistently overpredicted the number of 1 TMS proteins, and severely underpredicted the number of 0 TMS proteins. Analysis of subclass 1.C proteins using solely SPOCTOPUS and HMMTOP showed that a large percentage of toxins, 49%, were predicted to lack α-helical TMSs; 40% were predicted to have 1 TMS; 8%, 2 TMSs and 2%, 3 TMSs (see also table 6, fig. 4). No toxin was predicted to exhibit more than 4 TMSs. Recalling that these proteins can exist in both soluble and membrane-integrated forms, it is not surprising that half of them lack observable TMSs. The structures of most of the membrane-integrated forms are unknown, but some of them integrate as transmembrane β-structured proteins [Berne et al., 2005]. Most subclass 1.C channels are non-specific, transporting ions and small metabolites as well as proteins in some cases.

Holins exhibit a very distinctive topological pattern. About 4% of proteins were predicted to have 0 TMSs, roughly 35% exhibit 1 TMS, about 30% display 2 apparent TMSs, about 20% have 3 TMSs, and the remainder were predicted to have 4 TMSs (see table 6, fig. 3). No holin or putative holin was identified with more than 4 putative TMSs. These results indicate that, in contrast to subclass 1.A channels, there has been little intragenic duplication of transmembrane regions for members of subclass 1.E, although one case of a holin family with members having 2 or 4 TMSs, depending on the protein, proved to be a consequence of a 2 TMS duplication. These observations can be explained since most holins form oligomeric channels of low specificity that evolved to export autolysins. Some holins have been shown to form gigantic pores, where virtually all of the subunits in the cell form the borders [Dewey et al., 2010; Savva et al., 2008], but others, called pinholins, form well-defined small pores [Pang et al., 2009].

Secondary carriers (TC subclass 2.A) exhibit a very different pattern than noted for any of these three subclasses of channel proteins. There are very few proteins predicted to have 0, 1, 2 or 3 TMSs. Of these proteins, 62% proved to be members of the Mitochondrial Carrier Family (2.A.29) and are clearly mispredictions. It can be seen in figure 5 that the orders of prevalence of proteins with different predicted topologies (different numbers of putative TMSs) are 1 < 2 < 3 < 4 < 5 < 6, and 7 < 8 < 9 < 10 < 11 < 12. Furthermore, there are far fewer proteins predicted to have 1, 2 or 3 TMSs than 4, 5, or 6 TMSs, and there are far fewer proteins predicted to have 7, 8 or 9 TMSs than 10, 11 and 12 TMSs. We believe that the considerable number of proteins predicted to have odd numbers of TMSs is in part, artifactual due to program mispredictions. It is notable that 6 and 12 TMS proteins are the most prevalent types of carriers. Additionally, the order of prevalence of proteins predicted to have 12 or more TMSs is 12 > 13 > 14 > 15 > 16 > 17. Once again, the surprisingly large numbers of proteins with odd numbers of TMSs is at least partially due to program errors, but it is worthy of note that a number of carriers have been confirmed to have odd numbers of TMSs, with 5, 7, 9, 11 and 13 TMSs [Jack et al., 2000; Nugent and Jones, 2010; Young et al., 1999].

These results, taken together, are consistent with the model that simple channels with 1, 2 or 3 TMSs were the precursors of larger channel and carrier proteins that arose by intragenic multiplication events. The predominance of proteins with even numbers of TMSs is in agreement with this model as duplication and quadruplication events occurred more frequently than triplication events, and the most frequent basic repeat unit in many carriers appear to be a single 2 or 3 TMS element [Matias et al., 2010; Reddy et al., 2012]. While we and other laboratories have identified the internal repeats in many of these proteins, further research will be required to extend, quantitate and confirm the results of these studies (fig. 6).

Fig. 6

Schematic depiction of the proposed pathway for the evolution of transport proteins including different types of channel-forming proteins and secondary carriers. We further propose that primary active transport carriers and group translocators arose by the superimposition of energy-coupling enzymes such as ATPases. Finally, the integration of these systems into metabolic pathways resulted in the physical construction of complex but coordinated metabolons.

Fig. 6

Schematic depiction of the proposed pathway for the evolution of transport proteins including different types of channel-forming proteins and secondary carriers. We further propose that primary active transport carriers and group translocators arose by the superimposition of energy-coupling enzymes such as ATPases. Finally, the integration of these systems into metabolic pathways resulted in the physical construction of complex but coordinated metabolons.

Close modal

We have found that several families of channel-forming proteins have 0 or 1 predicted TMS and function in at least two capacities. One may be a soluble enzymatic catalytic or chaperone function while the other is a membrane-integrated channel-forming function. In other dual function proteins, the secondary function is regulatory, in addition to their primary channel-forming functions. It seems likely that in the former cases, the soluble function evolved as the primary function, while the channel function was secondarily acquired, but in the latter cases, channel formation may have arisen first, and a receptor, signal transduction or other regulatory function may have evolved secondarily. Several examples of transporters that have evolved receptor or regulatory functions are known, but it is surprising how seldom this functional transition has occurred.

Summarizing, almost all large α-type channels and carriers appear to have evolved from small channel-forming peptides by intragenic multiplication processes; the membrane insertion of soluble proteins to form channels is only occasionally observed, and this pathway for the evolution of carriers seems never to have been taken. Moreover, ‘once a transporter, always a transporter' seems to be the general rule, violated by only occasional exceptions such as those noted above. Thus, transporters in general evolved as a distinct class of proteins, evolving independently of the other protein classes such as enzymes, structural proteins and most regulatory proteins. The restraints imposed on the evolutionary processes are only now emerging, although the molecular bases for these constraints are not understood [Norris et al., 2007a, b]. This is a fertile area for future studies.

We thank Ake Vastermark for assistance with manuscript preparation and submission. This work was supported by NIH grant GM077402 from the National Institute of General Medical Sciences.

1.
Andrade SL, Dickmanns A, Ficner R, Einsle O: Crystal structure of the archaeal ammonium transporter Amt-1 from Archaeoglobus fulgidus. Proc Natl Acad Sci USA 2005;102:14994-14999.
[PubMed]
2.
Arbel N, Shoshan-Barmatz V: Voltage-dependent anion channel 1-based peptides interact with Bcl-2 to prevent antiapoptotic activity. J Biol Chem 2010;285:6053-6062.
[PubMed]
3.
Arispe N, De Maio A: Atp and Adp modulate a cation channel formed by Hsc70 in acidic phospholipid membranes. J Biol Chem 2000;275:30839-30843.
[PubMed]
4.
Banci L, Bertini I, Cantini F, Ciofi-Baffoni S: Cellular copper distribution: a mechanistic systems biology approach. Cell Mol Life Sci 2010;67:2563-2589.
[PubMed]
5.
Barabote RD, Tamang DG, Abeywardena SN, Fallah NS, Fu JY, Lio JK, Mirhosseini P, Pezeshk R, Podell S, Salampessy ML, Thever MD, Saier MH Jr: Extra domains in secondary transport carriers and channel proteins. Biochim Biophys Acta 2006;1758:1557-1579.
[PubMed]
6.
Bass RB, Strop P, Barclay M, Rees DC: Crystal structure of Escherichia coli MSCS, a voltage-modulated and mechanosensitive channel. Science 2002;298:1582-1587.
[PubMed]
7.
Basu S, Mukherjee S, Adhya S: Proton-guided movements of TRNA within the Leishmania mitochondrial RNA import complex. Nucleic Acids Res 2008;36:1599-1609.
[PubMed]
8.
Bayles KW: Are the molecular strategies that control apoptosis conserved in bacteria? Trends Microbiol 2003;11:306-311.
[PubMed]
9.
Bechinger B: Structure and functions of channel-forming peptides: magainins, cecropins, melittin and alamethicin. J Membr Biol 1997;156:197-211.
[PubMed]
10.
Berne S, Sepcic K, Anderluh G, Turk T, Macek P, Poklar Ulrih N: Effect of pH on the pore-forming activity and conformational stability of ostreolysin, a lipid raft-binding protein from the edible mushroom Pleurotus ostreatus. Biochemistry 2005;44:11137-11147.
[PubMed]
11.
Bernsel A, Viklund H, Hennerdal A, Elofsson A: TOPCONS: consensus prediction of membrane protein topology. Nucleic Acids Res 2009;37:W465-W468.
[PubMed]
12.
Cavard D: Assembly of colicin A in the outer membrane of producing Escherichia coli cells requires both phospholipase A and one porin, but phospholipase A is sufficient for secretion. J Bacteriol 2002;184:3723-3733.
[PubMed]
13.
Chang AB, Lin R, Keith Studley W, Tran CV, Saier MH Jr: Phylogeny as a guide to structure and function of membrane transport proteins. Mol Membr Biol 2004;21:171-181.
[PubMed]
14.
Chen YR, Yang TY, Lei GS, Lin LJ, Chak KF: Delineation of the translocation of colicin E7 across the inner membrane of Escherichia coli. Arch Microbiol 2011;193:419-428.
[PubMed]
15.
Cheung JY, Zhang XQ, Song J, Gao E, Chan TO, Rabinowitz JE, Koch WJ, Feldman AM, Wang J: Coordinated regulation of cardiac Na+/Ca2+ exchanger and Na+,K+-ATPase by phospholemman (FXYD1). Adv Exp Med Biol 2013;961:175-190.
[PubMed]
16.
Connolly CN: Trafficking of 5-HT3 and GABAA receptors (review). Mol Membr Biol 2008;25:293-301.
[PubMed]
17.
Coste B, Mathur J, Schmidt M, Earley TJ, Ranade S, Petrus MJ, Dubin AE, Patapoutian A: Piezo1 and Piezo2 are essential components of distinct mechanically activated cation channels. Science 2010;330:55-60.
[PubMed]
18.
Cserzo M, Eisenhaber F, Eisenhaber B, Simon I: On filtering false positive transmembrane protein predictions. Protein Eng 2002;15:745-752.
[PubMed]
19.
Damman CJ, Eggers CH, Samuels DS, Oliver DB: Characterization of Borrelia burgdorferi BlyA and BlyB proteins: a prophage-encoded holin-like system. J Bacteriol 2000;182:6791-6797.
[PubMed]
20.
Dewey JS, Savva CG, White RL, Vitha S, Holzenburg A, Young R: Micron-scale holes terminate the phage infection cycle. Proc Natl Acad Sci USA 2010;107:2219-2223.
[PubMed]
21.
Du GG, Sandhu B, Khanna VK, Guo XH, MacLennan DH: Topology of the Ca2+ release channel of skeletal muscle sarcoplasmic reticulum (RyR1). Proc Natl Acad Sci USA 2002;99:16725-16730.
[PubMed]
22.
Dumay QC, Debut AJ, Mansour NM, Saier MH Jr: The copper transporter (Ctr) family of Cu+ uptake systems. J Mol Microbiol Biotechnol 2006;11:10-19.
[PubMed]
23.
Elinder F, Nilsson J, Arhem P: On the opening of voltage-gated ion channels. Physiol Behav 2007;92:1-7.
[PubMed]
24.
Fischer WB, Wang YT, Schindler C, Chen CP: Mechanism of function of viral channel proteins and implications for drug development. Int Rev Cell Mol Biol 2012;294:259-321.
[PubMed]
25.
Gallagher AR, Germino GG, Somlo S: Molecular advances in autosomal dominant polycystic kidney disease. Adv Chronic Kidney Dis 2010;17:118-130.
[PubMed]
26.
Gao X, Zhou L, Jiao X, Lu F, Yan C, Zeng X, Wang J, Shi Y: Mechanism of substrate recognition and transport by an amino acid antiporter. Nature 2010;463:828-832.
[PubMed]
27.
Geering K: FXYD proteins: new regulators of Na,K-ATPase. Am J Physiol Renal Physiol 2006;290:F241-F250.
[PubMed]
28.
Goddard AD, Moir JW, Richardson DJ, Ferguson SJ: Interdependence of two NarK domains in a fused nitrate/nitrite transporter. Mol Microbiol 2008;70:667-681.
[PubMed]
29.
Gonen T, Walz T: The structure of aquaporins. Q Rev Biophys 2006;39:361-396.
[PubMed]
30.
Gonzales EB, Kawate T, Gouaux E: Pore architecture and ion sites in acid-sensing ion channels and P2X receptors. Nature 2009;460:599-604.
[PubMed]
31.
Gouaux E: Structure and function of AMPA receptors. J Physiol 2004;554:249-253.
[PubMed]
32.
Graschopf A, Blasi U: Molecular function of the dual-start motif in the lambda S holin. Mol Microbiol 1999;33:569-582.
[PubMed]
33.
Guan L, Smirnova IN, Verner G, Nagamori S, Kaback HR: Manipulating phospholipids for crystallization of a membrane transport protein. Proc Natl Acad Sci USA 2006;103:1723-1726.
[PubMed]
34.
He X, Szewczyk P, Karyakin A, Evin M, Hong WX, Zhang Q, Chang G: Structure of a cation-bound multidrug and toxic compound extrusion transporter. Nature 2010;467:991-994.
[PubMed]
35.
He Y, Ramsay AJ, Hunt ML, Whitbread AK, Myers SA, Hooper JD: N-glycosylation analysis of the human Tweety family of putative chloride ion channels supports a penta-spanning membrane arrangement: impact of N-glycosylation on cellular processing of Tweety homologue 2 (TTYH2). Biochem J 2008;412:45-55.
[PubMed]
36.
Herzig S, Raemy E, Montessuit S, Veuthey JL, Zamboni N, Westermann B, Kunji ER, Martinou JC: Identification and functional expression of the mitochondrial pyruvate carrier. Science 2012;337:93-96.
[PubMed]
37.
Hirokawa T, Boon-Chieng S, Mitaku S: Sosui: Classification and secondary structure prediction system for membrane proteins. Bioinformatics 1998;14:378-379.
[PubMed]
38.
Hvorup RN, Winnen B, Chang AB, Jiang Y, Zhou XF, Saier MH Jr: The multidrug/oligosaccharidyl-lipid/polysaccharide (MOP) exporter superfamily. Eur J Biochem 2003;270:799-813.
[PubMed]
39.
Jack DL, Paulsen IT, Saier MH: The amino acid/polyamine/organocation (APC) superfamily of transporters specific for amino acids, polyamines and organocations. Microbiology 2000;146:1797-1814.
[PubMed]
40.
Jiang D, Zhao L, Clapham DE: Genome-wide RNAi screen identifies Letm1 as a mitochondrial Ca2+/H+ antiporter. Science 2009;326:144-147.
[PubMed]
41.
Kall L, Krogh A, Sonnhammer EL: A combined transmembrane topology and signal peptide prediction method. J Mol Biol 2004;338:1027-1036.
[PubMed]
42.
Kato Y, Sakaguchi M, Mori Y, Saito K, Nakamura T, Bakker EP, Sato Y, Goshima S, Uozumi N: Evidence in support of a four transmembrane-pore-transmembrane topology model for the Arabidopsis thaliana Na+/K+ translocating AtHKT1 protein, a member of the superfamily of K+ transporters. Proc Natl Acad Sci USA 2001;98:6488-6493.
[PubMed]
43.
Khademi S, O'Connell J 3rd, Remis J, Robles-Colmenares Y, Miercke LJ, Stroud RM: Mechanism of ammonia transport by Amt/MEP/Rh: structure of AmtB at 1.35 A. Science 2004;305:1587-1594.
[PubMed]
44.
Kim SE, Coste B, Chadha A, Cook B, Patapoutian A: The role of Drosophila Piezo in mechanical nociception. Nature 2012;483:209-212.
[PubMed]
45.
Kowdley GC, Ackerman SJ, Chen Z, Szabo G, Jones LR, Moorman JR: Anion, cation, and zwitterion selectivity of phospholemman channel molecules. Biophys J 1997;72:141-145.
[PubMed]
46.
Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 2001;305:567-580.
[PubMed]
47.
Kuan J, Saier MH Jr: The mitochondrial carrier family of transport proteins: structural, functional, and evolutionary relationships. Crit Rev Biochem Mol Biol 1993;28:209-233.
[PubMed]
48.
Labrie SJ, Samson JE, Moineau S: Bacteriophage resistance mechanisms. Nat Rev Microbiol 2010;8:317-327.
[PubMed]
49.
Lam VH, Lee JH, Silverio A, Chan H, Gomolplitinant KM, Povolotsky TL, Orlova E, Sun EI, Welliver CH, Saier MH Jr: Pathways of transport protein evolution: recent advances. Biol Chem 2011;392:5-12.
[PubMed]
50.
Latorre R, Zaelzer C, Brauchi S: Structure-functional intimacies of transient receptor potential channels. Q Rev Biophys 2009;42:201-246.
[PubMed]
51.
Laudon H, Hansson EM, Melen K, Bergman A, Farmery MR, Winblad B, Lendahl U, von Heijne G, Naslund J: A nine-transmembrane domain topology for presenilin 1. J Biol Chem 2005;280:35352-35360.
[PubMed]
52.
Lee AG: Biological membranes: the importance of molecular detail. Trends Biochem Sci 2011;36:493-500.
[PubMed]
53.
Lo A, Chiu HS, Sung TY, Lyu PC, Hsu WL: Enhanced membrane protein topology prediction using a hierarchical classification method and a new scoring function. J Proteome Res 2008;7:487-496.
[PubMed]
54.
Lu F, Li S, Jiang Y, Jiang J, Fan H, Lu G, Deng D, Dang S, Zhang X, Wang J, Yan N: Structure and mechanism of the uracil transporter UraA. Nature 2011;472:243-246.
[PubMed]
55.
Maeda S, Nakagawa S, Suga M, Yamashita E, Oshima A, Fujiyoshi Y, Tsukihara T: Structure of the connexin 26 gap junction channel at 3.5 A resolution. Nature 2009;458:597-602.
[PubMed]
56.
Matias MG, Gomolplitinant KM, Tamang DG, Saier MH Jr: Animal Ca2+ release-activated Ca2+ (CRAC) channels appear to be homologous to and derived from the ubiquitous cation diffusion facilitators. BMC Res Notes 2010;3:158.
[PubMed]
57.
Mayer ML: Glutamate receptors at atomic resolution. Nature 2006;440:456-462.
[PubMed]
58.
McNeil AK, Rescher U, Gerke V, McNeil PL: Requirement for annexin A1 in plasma membrane repair. J Biol Chem 2006;281:35202-35207.
[PubMed]
59.
Mok T, Chen J, Shlykov M, Saier M Jr: Bioinformatic analyses of bacterial mercury ion (Hg2+) transporters. Water Air Soil Pollut 2012;223:4443-4457.
60.
Moorman JR, Ackerman SJ, Kowdley GC, Griffin MP, Mounsey JP, Chen Z, Cala SE, O'Brian JJ, Szabo G, Jones LR: Unitary anion currents through phospholemman channel molecules. Nature 1995;377:737-740.
[PubMed]
61.
Nakayama K, Takashima K, Ishihara H, Shinomiya T, Kageyama M, Kanaya S, Ohnishi M, Murata T, Mori H, Hayashi T: The R-type pyocin of Pseudomonas aeruginosa is related to P2 phage, and the F-type is related to lambda phage. Mol Microbiol 2000;38:213-231.
[PubMed]
62.
Nelson RD, Kuan G, Saier MH Jr, Montal M: Modular assembly of voltage-gated channel proteins: a sequence analysis and phylogenetic study. J Mol Microbiol Biotechnol 1999;1:281-287.
[PubMed]
63.
Nilius B, Eggermont J, Voets T, Droogmans G: Volume-activated Cl- channels. Gen Pharmacol 1996;27:1131-1140.
[PubMed]
64.
Norris AJ, Foeger NC, Nerbonne JM: Neuronal voltage-gated K+ (Kv) channels function in macromolecular complexes. Neurosci Lett 2010;486:73-77.
[PubMed]
65.
Norris V, den Blaauwen T, Cabin-Flaman A, Doi RH, Harshey R, Janniere L, Jimenez-Sanchez A, Jin DJ, Levin PA, Mileykovskaya E, Minsky A, Saier M Jr, Skarstad K: Functional taxonomy of bacterial hyperstructures. Microbiol Mol Biol Rev 2007a;71:230-253.
[PubMed]
66.
Norris V, den Blaauwen T, Doi RH, Harshey RM, Janniere L, Jimenez-Sanchez A, Jin DJ, Levin PA, Mileykovskaya E, Minsky A, Misevic G, Ripoll C, Saier M Jr, Skarstad K, Thellier M: Toward a hyperstructure taxonomy. Annu Rev Microbiol 2007b;61:309-329.
[PubMed]
67.
Nugent T, Jones DT: Predicting transmembrane helix packing arrangements using residue contacts and a force-directed algorithm. PLoS Comput Biol 2010;6:e1000714.
[PubMed]
68.
Ostroumova OS, Schagina LV, Mosevitsky MI, Zakharov VV: Ion channel activity of brain abundant protein BASP1 in planar lipid bilayers. FEBS J 2011;278:461-469.
[PubMed]
69.
Palmieri F: The mitochondrial transporter family SLC25: identification, properties and physiopathology. Mol Aspects Med 2013;34:465-484.
[PubMed]
70.
Pang T, Savva CG, Fleming KG, Struck DK, Young R: Structure of the lethal phage pinhole. Proc Natl Acad Sci USA 2009;106:18966-18971.
[PubMed]
71.
Pao SS, Paulsen IT, Saier MH Jr: Major facilitator superfamily. Microbiol Mol Biol Rev 1998;62:1-34.
[PubMed]
72.
Park JH, Saier MH Jr: Phylogenetic characterization of the MIP family of transmembrane channel proteins. J Membr Biol 1996;153:171-180.
[PubMed]
73.
Passamonti S, Terdoslavich M, Margon A, Cocolo A, Medic N, Micali F, Decorti G, Franko M: Uptake of bilirubin into HepG2 cells assayed by thermal lens spectroscopy. Function of bilitranslocase. FEBS J 2005;272:5522-5535.
[PubMed]
74.
Paulsen IT, Brown MH, Littlejohn TG, Mitchell BA, Skurray RA: Multidrug resistance proteins QacA and QacB from Staphylococcus aureus: membrane topology and identification of residues involved in substrate specificity. Proc Natl Acad Sci USA 1996;93:3630-3635.
[PubMed]
75.
Petris MJ: The SLC31 (Ctr) copper transporter family. Pflugers Arch 2004;447:752-755.
[PubMed]
76.
Pivetti CD, Yen MR, Miller S, Busch W, Tseng YH, Booth IR, Saier MH Jr: Two families of mechanosensitive channel proteins. Microbiol Mol Biol Rev 2003;67:66-85, table of contents.
[PubMed]
77.
Real G, Pinto SM, Schyns G, Costa T, Henriques AO, Moran CP Jr: A gene encoding a holin-like protein involved in spore morphogenesis and spore germination in Bacillus subtilis. J Bacteriol 2005;187:6443-6453.
[PubMed]
78.
Reddy V, Shlykov MA, Castillo R, Sun EI, Saier MH Jr: The major facilitator superfamily (MFS) revisited. FEBS J 2012;279:2022-2035.
[PubMed]
79.
Ritter M, Ravasio A, Jakab M, Chwatal S, Furst J, Laich A, Gschwentner M, Signorelli S, Burtscher C, Eichmuller S, Paulmichl M: Cell swelling stimulates cytosol to membrane transposition of ICln. J Biol Chem 2003;278:50163-50174.
[PubMed]
80.
Rydman PS, Bamford DH: Identification and mutational analysis of bacteriophage PRD1 holin protein P35. J Bacteriol 2003;185:3795-3803.
[PubMed]
81.
Saier MH Jr: Families of proteins forming transmembrane channels. J Membr Biol 2000;175:165-180.
[PubMed]
82.
Saier MH Jr: Tracing pathways of transport protein evolution. Mol Microbiol 2003;48:1145-1156.
[PubMed]
83.
Saier MH Jr, Reddy VS, Tamang DG, Vastermark A: The transporter classification database. Nucleic Acids Res 2014;42:D251-D258.
[PubMed]
84.
Saier MH Jr, Tran CV, Barabote RD: TCDB: The transporter classification database for membrane transport protein analyses and information. Nucleic Acids Res 2006;34:D181-D186.
[PubMed]
85.
Saier MH Jr, Yen MR, Noto K, Tamang DG, Elkan C: The transporter classification database: recent advances. Nucleic Acids Res 2009;37:D274-D278.
[PubMed]
86.
Saier MH, Ma CH, Rodgers L, Tamang DG, Yen MR: Protein secretion and membrane insertion systems in bacteria and eukaryotic organelles. Adv Appl Microbiol 2008;65:141-197.
[PubMed]
87.
Saris NE, Andersson MA, Mikkola R, Andersson LC, Teplova VV, Grigoriev PA, Salkinoja-Salonen MS: Microbial toxin's effect on mitochondrial survival by increasing K+ uptake. Toxicol Ind Health 2009;25:441-446.
[PubMed]
88.
Sato T, Mihara K: Topogenesis of mammalian Oxa1, a component of the mitochondrial inner membrane protein export machinery. J Biol Chem 2009;284:14819-14827.
[PubMed]
89.
Savva CG, Dewey JS, Deaton J, White RL, Struck DK, Holzenburg A, Young R: The holin of bacteriophage lambda forms rings with large diameter. Mol Microbiol 2008;69:784-793.
[PubMed]
90.
Sheehan MM, Garcia JL, Lopez R, Garcia P: The lytic enzyme of the pneumococcal phage Dp-1: a chimeric lysin of intergeneric origin. Mol Microbiol 1997;25:717-725.
[PubMed]
91.
Silverio AL, Saier MH Jr: Bioinformatic characterization of the trimeric intracellular cation-specific channel protein family. J Membr Biol 2011;241:77-101.
[PubMed]
92.
Tan KS, Wee BY, Song KP: Evidence for holin function of tcdE gene in the pathogenicity of Clostridium difficile. J Med Microbiol 2001;50:613-619.
[PubMed]
93.
Thompson AJ, Williamson R: Protocol for quantitative proteomics of cellular membranes and membrane rafts. Methods Mol Biol 2010;658:235-253.
[PubMed]
94.
Tsukazaki T, Mori H, Echizen Y, Ishitani R, Fukai S, Tanaka T, Perederina A, Vassylyev DG, Kohno T, Maturana AD, Ito K, Nureki O: Structure and function of a membrane component SecDF that enhances protein export. Nature 2011;474:235-238.
[PubMed]
95.
Tu H, Nelson O, Bezprozvanny A, Wang Z, Lee SF, Hao YH, Serneels L, De Strooper B, Yu G, Bezprozvanny I: Presenilins form ER Ca2+ leak channels, a function disrupted by familial Alzheimer's disease-linked mutations. Cell 2006;126:981-993.
[PubMed]
96.
Tusnady GE, Simon I: The HMMTOP transmembrane topology prediction server. Bioinformatics 2001;17:849-850.
[PubMed]
97.
Viklund H, Bernsel A, Skwark M, Elofsson A: Spoctopus: A combined predictor of signal peptides and membrane protein topology. Bioinformatics 2008;24:2928-2929.
[PubMed]
98.
Von Rozycki T, Schultzel MA, Saier MH Jr: Sequence analyses of cyanobacterial bicarbonate transporters and their homologues. J Mol Microbiol Biotechnol 2004;7:102-108.
[PubMed]
99.
Vukov N, Moll I, Blasi U, Scherer S, Loessner MJ: Functional regulation of the Listeria monocytogenes bacteriophage A118 holin by an intragenic inhibitor lacking the first transmembrane domain. Mol Microbiol 2003;48:173-186.
[PubMed]
100.
Wang W, Black SS, Edwards MD, Miller S, Morrison EL, Bartlett W, Dong C, Naismith JH, Booth IR: The structure of an open form of an E. coli mechanosensitive channel at 3.45 A resolution. Science 2008;321:1179-1183.
[PubMed]
101.
Yamaguchi A, Tamang DG, Saier MH: Mercury transport in bacteria. Water Air Soil Pollut 2007;182:219-234.
102.
Yang YD, Cho H, Koo JY, Tak MH, Cho Y, Shim WS, Park SP, Lee J, Lee B, Kim BM, Raouf R, Shin YK, Oh U: TMEM16A confers receptor-activated calcium-dependent chloride conductance. Nature 2008;455:1210-1215.
[PubMed]
103.
Yao CK, Lin YQ, Ly CV, Ohyama T, Haueter CM, Moiseenkova-Bell VY, Wensel TG, Bellen HJ: A synaptic vesicle-associated Ca2+ channel promotes endocytosis and couples exocytosis to endocytosis. Cell 2009;138:947-960.
[PubMed]
104.
Yonekura K, Maki-Yonekura S, Homma M: Structure of the flagellar motor protein complex PomAB: implications for the torque-generating conformation. J Bacteriol 2011;193:3863-3870.
[PubMed]
105.
Young GB, Jack DL, Smith DW, Saier MH Jr: The amino acid/auxin: proton symport permease family. Biochim Biophys Acta 1999;1415:306-322.
[PubMed]
106.
Zeng GF, Pypaert M, Slayman CL: Epitope tagging of the yeast K+ carrier Trk2p demonstrates folding that is consistent with a channel-like structure. J Biol Chem 2004;279:3003-3013.
[PubMed]
107.
Zhai Y, Saier MH Jr: A web-based program for the prediction of average hydropathy, average amphipathicity and average similarity of multiply aligned homologous proteins. J Mol Microbiol Biotechnol 2001a;3:285-286.
[PubMed]
108.
Zhai Y, Saier MH Jr: A web-based program (WHAT) for the simultaneous prediction of hydropathy, amphipathicity, secondary structure and transmembrane topology for a single protein sequence. J Mol Microbiol Biotechnol 2001b;3:501-502.
[PubMed]