477-30-5 br Table br Healthy Homo sapiens genes
Healthy Homo sapiens genes.
Gene nature Gene length range Gene ID Gene name Block length
Hydrophobic > 450 ATBFRUCT1 Glycosyl hydrolases family 32 protein 541
ATCWINV1 Beta-fructofuranosidase 537
GLB1 Galactosidase beta 1 678
KDM1A Lysine demethylase 1A 686
MYO1C Myosin IC 725
cmeC Multidrug eﬄux pump protein CmeC 479
TTHA1135 ba3-type cytochrome C oxidase polypeptide I 568
acrB Multidrug eﬄux system protein 1057
ECK0456 Multidrug eﬄux pump subunit AcrB 1057
spr1652 Cell wall surface anchor family protein 648
FSHMD1A Facioscapulohumeral muscular dystrophy 1A 802
bamA Outer membrane protein assembly factor BamA 532
Measured phase values (deg) for Homo sapiens's genes.
Gene type Gene ID Frequency in Hz
amino 477-30-5 in the pure hydrophilic gene chain. Resistance Rn is same for all hydrophobic and hydrophilic amino acids.
Now the recurrence relations for gene chain consists of hydrophobic and hydrophilic residues both, are as follows:
where Gn = Nn/Dn, Nn and Dn are the polynomials of degree n for both hydrophilic and hydrophobic genes. Therefore using these expressions, the transfer function can easily be computed for the electrical system model of amino acid chain of any arbitrary length.
3. Results and discussions
The genetic attributes are investigated by modeling sensor network for gene, which is tested on 40 gene databases (25 cancerous or hy-drophilic and 15 non-cancerous or hydrophobic) (Table 2 and Table 3) and the databases for the genes are downloaded from public domain (http://www.ncbi.nlm.nih.gov; http://cgap.nci.nih.gov; http://www. genecards.org). The electrical responses of the sensor are simulated in MATLAB (version R2009b) environment.
3.1. Behavior analysis of sensor network using bode plot
The sensor networks representing genes are analyzed in frequency domain by observing their spectrums. Diﬀerentiation between
Fig. 2. Gene sensor responses in phase for cancer and non-cancer genes. The phase response for cancer gene shows negative value whereas non-cancer gene shows positive value at higher frequency. A. NUP214 vs. LOC107815086 gene phase plots. B. spr1652 vs. SHBG gene phase plots.
cancerous and non-cancerous genes is obtained by investigating the correlation of the gene features and their simulated system behavior.
The sensor behavior is studied using Bode magnitude and phase
Fig. 3. Confusion matrix of binary classifier for gene classification. Genes are classified based on their hydrophilicity and hydrophobicity features.
Table 5 Performance evaluation metrics for genes at diﬀerent frequency.
Frequency (Hz) Accuracy MCC TP rate TN rate Precision (P) Precision (N)
values in the frequency range of 1 Hz to 1 MHz as detailed in Table 4. There are no marked diﬀerences observed in amplitude values; hence only the phase values of all the electrical system models representing genes are considered. The plots in Fig. 2 are obtained by cascading the amino acid circuit models, where each amino acid having constant Rb of 7 Ω for backbone circuit and LSC or CSC of diﬀerent values depending on hydropathy index values of hydrophobic or hydrophilic amino acid for side chain. Fig. 2 exhibits significant diﬀerences in phase responses between cancerous and non-cancerous genes and the phase values are markedly distinguished from each other within the frequency range of 50 kHz to 1 MHz.
The simulated results (Fig. 2) for cancerous genes show negative phase at higher frequency as they are modeled by cascaded RC parallel circuit, which indicates these genes contain large amount of hydrophilic or polar amino acids. Whereas non-cancerous genes exhibit positive phase at higher frequency since they realized by RL parallel circuit, indicates they made up of large amount of hydrophobic amino acids. Therefore the cancerous and non-cancerous genes exhibit polar and
nonpolar characteristics respectively, which are clearly observed by the corresponding simulated phase responses, and the sensor realization is truly matched with the biological features (Stranzl et al., 2012) of genes.
3.2. Performance evaluation of sensor characteristics
The gene datasets, collected from the national website for health-care, are classified by modeling sensor. The sensor performance is judged by receiver operating characteristic (ROC) curve and analyzed using the following measurement metrics:
• Accuracy is the ratio of the number of correctly classified genes to the total number of genes.
• True positive rate or sensitivity (TPR) is the ratio of the number of correctly classified genes from the positive class (TP) i.e. cancer to the number of all genes from the positive class (TP + FN).