br Consensus score formulation br For the first two comparis
Consensus score formulation
For the first two comparisons (DE analyses), Selamectin were ranked by their combined fold-change and significance (FDR-adjusted p value). Fold-changes were ranked directly, with higher ranks assigned to genes with greater positive log2FC (tumor/normal), and vice versa. Prior to ranking p values, the associated FC direction was incorporated to generate directional p values (pdir) for each gene i (analogous to the approach described in (Va¨remo et al., 2013)):
where sign(FC) is the sign of the corresponding log2(fold-change). In this manner, genes with low p values and a positive FC receive a pdir near zero, whereas genes with low p values but a negative FC have a pdir close to one. Genes associated with a high p value will therefore have a pdir near 0.5, regardless of FC direction. These pdir values were then ranked such that higher ranks were assigned to genes with lower pdir values. Finally, the p-like scores generated from the third comparison (tumor versus all tissues) were ranked directly, where low p-scores (high significance) were ranked highly, and vice versa.
The consensus rank score was calculated by combining the gene ranks from each of the three comparative analyses, as illustrated in Figure 1A. Specifically, the FC and pdir ranks from the first comparison were averaged, and this mean rank was averaged with the mean of the FC and pdir ranks from the second comparison. The resulting combined rank was averaged with the rank of p-like scores from the third comparison to yield the overall consensus rank score, enabling the prediction of candidate biomarkers for each cancer type. The effective weight ratios from the three comparisons (tumor versus paired normal, tumor versus healthy tissue-of-origin, and tumor versus all healthy tissues) in the consensus score were therefore 1:1:2, respectively. The ratios were assigned as such because the score was designed to place equal weight on expression differences of tumor versus tissue-of-origin, and of tumor versus all tissues. Since comparisons 1 and 2 both quantify tumor versus tissue-of-origin differences, they were each assigned half the weight of comparison 3, which quantified tumor versus all tissue differences. Moreover, since the information from the first two comparisons is likely to exhibit more redundancy (paired normal tissue and healthy tissue-of-origin are relatively similar in their expression profiles compared to other tissue types), they were weighted less than comparison 3.
Cancer types lacking paired-normal or healthy tissue data
Among the 32 TCGA cancer types with available primary tumor samples, 12 lacked sufficient paired-normal tissue data from TCGA to be included in the first comparison, and 6 types could not be appropriately matched to one of the tissue types defined in GTEx (e.g., SARC, ‘‘sarcoma’’), and thus could not be included in the second comparison. However, the genes were still scored based on the results from the remaining comparisons that could be performed. Although there is less confidence associated with the scores for these particular cancer types, potential biomarkers could still be identified. For example, the top-scoring candidate for ovarian can-cer (OV) was WFDC2 (also known as HE4), which is an established OV protein biomarker in both urine and serum (Hellstro¨m et al., 2003, 2010), and the next top 6 candidates included FOLR1, KLK6, KLK7, and MSLN, all of which have been experimentally confirmed as biofluid diagnostic markers of OV (Badgwell et al., 2007; Diamandis et al., 2003; Leung et al., 2013; Tamir et al., 2014).
Core secretome definition and analysis
To focus on changes in secretome expression associated specifically with malignant progression rather than inter-individual and in-ter-tissue variation, the analysis was conducted using paired tumor-normal samples from TCGA. Furthermore, cancer types with only a few sample pairs (CESC, PAAD, and PCPG; each had only 2 or 3 pairs) were excluded, yielding a final dataset spanning 17 cancer types, 683 patients, and 1,563 secretome genes (very low-count or non-detected genes were excluded).