# br where Qis a variational

where Qis a variational distribution, DKL is the Kullback-Leibler dis-tance, and p(vi) and p(hi vi) are the prior and posterior, respectively.

The constrained posterior of a bicluster is obtained by multiplying the input matrix by a vector, and subsequently rectifying and normal-izing the code unit. To make the feature membership vectors and

sample membership vectors sparse, a Laplace prior on the parameters of the original RFN model and a component-wise independent Laplace prior for the weights W are introduced. To get the final biclusters of the input matrix we used threshold values H _thr and W _thr to filter out genes and samples in each bicluster as in [37].

RFN can easily get thousands of biclusters from a very large matrix efficiently. We iteratively ran RFN many times and each time only one bicluster with the highest absolute mean Z-score value and smallest p-value was selected. After many iterations, a large number of biclusters can be obtained. As in [39], we used the p-value of its most enriched biological pathway as the p-value of a bicluster. Specifically, the prob-abilities of having × genes of the same function in a bicluster of size n with a total of N genes can be computed using the following hy-pergeometric function:

x

where pis the percentage of that pathway among all pathways in the whole pathway terms. The p-value is defined in Eq. (4).

x

p

i
n

i

To get breast cancer-specific biclusters, only biclusters detected in breast cancer samples but not in normal samples are kept. As some genes belong to different functional categories, the biclusters extracted from a gene EPZ-6438 matrix should have overlap below a predefined threshold. Here, we used empirical 0.5 as suggested in Orzechowski et al. [40].
The pseudo code of cancer-specific bicluster detection is given below:

Methods xxx (xxxx) xxx–xxx

In this method, the input is breast cancer and normal combined expression matrix EC and EN. The output is breast cancer-specific biclusters. The parameters include n_hidden (number of latent variables to estimate), n_iter (number of iterations to run the algorithm), learnrateW (learning rate of the W parameter), learnratePsi (learning rate of the Psi parameter), dropout_rate (dropout rate for the latent variables), minP (minimal value for Psi), H_thr (the threshold value used to extract features belonging to a bicluster) and W_thr (the threshold value used to extract samples belonging to a bicluster).

2.3. Prioritization of bicluster coding genes and miRNAs

We propose to prioritize breast cancer-related coding genes and miRNAs by integrating four aspects of information (as shown in Fig. 1). Only coding genes and miRNAs in breast cancer-specific biclusters are considered. For a coding gene or miRNA in a bicluster, the average differential correlation value dci is defined in Eq. (5).

where N is the total number of genes (coding genes and miRNAs) in a bicluster, fij = 1 if the changes in the correlation relationship between two genes and between two experimental conditions are both sig-nificant; otherwise, fij = 0. Fisher’s z-test is used to test differential correlation between two conditions (normal and cancer). To test whe-ther the two Pearson correlation coefficients in normal and cancer are significantly different, we transformed rN and rC into ZN and ZC, re-spectively [41]. The Fisher’s transformation of rN is defined in Eq. (6).

rN

Similarly, we transform rC to ZC. We used Eq. (7) to test the dif-ference between two correlations.

ZN

ZC

nN

where nN and nC represent the sample sizes of normal and cancer samples, respectively. We used the local false discovery rate (fdr) in the fdrtool R package to test the significance [41,42].