The Problem With Knockoffs in Airsoft

1 Introduction

In many machine learning and statistics settings, nosotros have a supervised learning problem where the outcome

depends on a subset of the features , potentially in complex ways, and we would like to place these salient features. Take medical genetics as an example, the features are the genotypes at variants in the genome, and is a binary indicator for the presence/absenteeism of disease. The true model could be that , where is the salient subset of the variants, and is some dissonance/randomness. It is tremendously important to identify which feature/variant is in .

If nosotros assume that is unproblematic, say a linear function, then nosotros might hope to use the fitted parameter of a model to select salient features. For example, we might fit a Generalized Linear Models (GLM) on with LASSO penalty to promote sparsity in the coefficients (Tibshirani, 1996), and subsequently select those features with not-zero coefficients. Pace-wise procedures where we sequentially change the model is another way of doing characteristic selection (Mallows, 1973).

A clear limitation of this parametric approach is the need to to have a good model for . For the genetics example, in that location is no dandy model. Moreover the standard feature selection methods are all plagued by correlations between the features: a feature that is not really relevant for the consequence, i.e. not in , can be selected past LASSO or Step-wise procedure, considering information technology is correlated with relevant features. In these settings nosotros ordinarily lack statistical guarantees on the validity of the selected features. Finally, fifty-fifty procedures with statistical guarantees normally depend on having valid -values, which are based on a correct modeling of and (sometimes) assume some asymptotic regime. Yet at that place are many common settings where these assumptions fail and nosotros cannot perform inference based on those -values (Sur and Candès, 2018).

A powerful new approach called Model-10 knockoff procedure (Candès et al., 2018) has recently emerged to deal with these issues. This method introduces a new image: we no longer assume whatever model for the distribution of in order to practice feature pick (and therefore do not compute -values), but nosotros assume that we have full cognition of the feature distribution – or at least we can accurately model it, though there are some robustness results (Barber et al., 2018). This knowledge of the ground truth allows us to sample new knockoff variables satisfying some precise distributional weather condition. Although we make no assumption on , nosotros can use the knockoff process to select features while controlling the Simulated Discovery Rate (FDR), which is the average proportion of the selected features that are not in .

I major obstacle for the widespread application of knockoffs is its instability. The entire process sensitively depends on the knockoff sample , which is random. Therefore, running the knockoff procedure twice may lead to very unlike selected sets of features. Our analysis in Section4 shows that instability is peculiarly severe when the number of salient features (i.east. the size of ) is small-scale, as is oft the case. Too, whenever the number of features is very large, previous methods for generating knockoffs failed to consistently generate good samples for , leading to inconsistent selection sets if several runs of the procedure were done simultaneously. Power also decreases drastically under those previous methods. This makes it challenging to ostend the selected variants in a replication experiment. Addressing the instability of knockoffs is therefore an important problem.

Our Contributions.

We generalize the standard (unmarried) knockoff procedure to simultaneous multiple knockoffs (or multi-knockoff for short). Our multi-knockoff process guarantees FDR control and has better statistical properties than the original knockoff, especially when the number of salient features is pocket-sized. We propose a new entropy maximization algorithm to sample Gaussian multi-knockoffs. Our systematic experiments demonstrate that multi-knockoff is more stable and more than powerful than the original (single) knockoff. Moreover we illustrate how multi-knockoff can amend the ability to select causal variants in Genome Wide Association Studies (GWAS).

ii Background on Knockoffs

We begin past introducing the usual setting of feature option procedures. We consider the data as a sequence of i.i.d. samples from some unknown articulation distribution:

, . We then define the set of nothing features by if and only if (where the subscript indicates all variables except the

th and bold letters indicate vectors). The not-nothing features, also called alternatives, are important considering they capture the truly salient effects and the goal of option procedures is to place them. Running the knockoff procedure gives u.s.a. a selected set

, while controlling for Imitation Discovery Charge per unit (FDR), which stands for the expected rate of fake discoveries: . The ratio is also called Imitation Discovery Proportion (FDP).

Assuming we know the ground truth for the distribution , the first step of the standard knockoff procedure is to obtain a knockoff sample that satisfies the following conditions:

Definition 2.1 (Knockoff sample).

A knockoff sample

of a d-dimensional random variable

is a d-dimensional random variable such that two properties are satisfied:

Conditional independence:

where stands for equality in distribution and refers to the vector where the original th feature and the thursday knockoff feature have been swapped whenever .

The starting time status is immediately satisfied equally long as knockoffs are sampled conditionally on the sample without considering whatever information about (which will be the example in our sampling methods then nosotros will non mention information technology once again). The second status ensures that the knockoff of each feature is sufficiently similar to the original characteristic in order to be a adept comparison baseline. We also denote by the matrix where we stack the i.i.d. -dimensional samples into 1 matrix (this is acceptable every bit the i.i.d. assumption allows for all sampling procedures to be done sample-wise).

The next step of the knockoff procedure constructs what nosotros telephone call feature statistics , such that a high value for is show that the thursday characteristic is non-aught. Feature statistics described in Candès et al. (2018) depend only on such that for each we can write for some function . The only brake these statistics must satisfy is the flip-sign property: swapping the th feature and its corresponding knockoff feature should flip the sign of the statistic while leaving other feature statistics unchanged. More formally, for a subset of features, cogent the data matrix where the original th variable and its corresponding knockoff take been swapped whenever , we have:

Equally suggested in Candès et al. (2018), the selection of feature statistics tin can be done in 2 steps: first, find a statistic where each coordinate corresponds to the "importance" — hence we will phone call them importance scores — of the corresponding feature (either original or knockoff). For example, would be the absolute value of the regression coefficient of the thursday feature.

After obtaining the importance score for the original and knockoff characteristic, we take the deviation to compute the feature statistic . The intuition is that importance scores of knockoffs serve as a control, a larger importance score of the th characteristic compared to that of its knockoff implies a larger positive (and therefore is prove confronting the null). Given some target FDR level that we gear up in advance, we define the selection set based on the following threshold :

According to Theorem three.4 in Candès et al. (2018), this procedure controls FDR at level (actually called Knockoff + procedure). The mechanism backside this procedure is the Selective SeqStep+ introduced in Barber et al. (2015). The intuition is that we try to maximize the number of selections while bounding by

an upwardly biased estimate of the FDP, which is the fraction:

The added constant in the numerator is called the "get-go", equal to one in our case; a unlike FDP guess with starting time equal to 0 leads to a slightly different procedure that controls a modified, less stringent version of the FDR.

Instability of Knockoffs

If we generate multiple replication datasets —multiple versions of , each of which is sampled from the common —then the knockoff procedure guarantees that on average, the proportion of false discoveries is less than the desired threshold. However for a particular dataset , the selected features could exist very dissimilar from that of another sample, as nosotros empirically demonstrate in Section4. There are settings where in one-half of the experiments, the knockoff procedure selects a large number of features, and it selects zero features in the other half of the times. This instability is a major upshot if we want to ensure that the discoveries from data are reproducible.

The instability in the selected features is partially due to the randomness in and in the knockoff sample . The knockoff procedure, equations1-3, is likewise sensitive to the sample . The knockoff pick set is based on a conservative approximate of the FDP given past equation (3). The threshold that determines the selected set requires such FDP estimate to be below some target FDR level set in advance, which in turn requires us to select at least features due to the presence of the offset in the numerator. This requirement is a great source of instability of the knockoff procedure: whenever the number of non-nulls is close to that threshold value , we can cease up either selecting a fairly big number of not-nulls, or not selecting whatsoever, even when the signal is potent. Our goal is to develop a new knockoff process which controls FDR and is more stable. We achieve this by introducing simultaneous multiple knockoffs, chosen multi-knockoffs for short, which extends the standard knockoff procedure.

iii Simultaneous Multiple Knockoffs

A Naive Flawed Approach to Multi-knockoffs

One approach to improve the stability of the selected features is to run the standard knockoff procedure multiple times in parallel and to take some blazon of consensus. This arroyo is flawed and does not control FDR. The reason is that by running knockoff multiple times in parallel, the symmetry between and the knockoff samples is cleaved. To maintain symmetry and guarantee FDR, we need to simultaneously sample multiple knockoffs. We will make this more than precise now.

iii.one Multi-knockoff Selection Procedure

Fix an positive integer , the multi-knockoff parameter (the usual single knockoff case corresponds to , for which all of our results are too valid). The goal is to extend the previous distributional properties of knockoffs of Definition2.1 to settings where we simultaneously sample knockoff copies of the aforementioned -valued dataset (where, again, denotes either the -valued random variable when making distributional statements or a matrix when referring to the feature set of i.i.d. samples). Equally in the single knockoff setting, we can define an equivalence notion and notation for swapping multiple vectors. Instead of defining for some subset of indices, nosotros consider a drove of permutations over the gear up of integers , one for each of the initial dimensions. Whenever we will use multi-knockoffs, we will index the original features by . We define the permuted vector , where for all , . Each permutes the features corresponding to the th dimension of each vector , leaving the other dimensions unchanged. Once this generalized swap notion is defined, we extend the exchangeability belongings based on the invariance of the articulation distribution to such transformations.

Definition 3.i .

We say that the concatenated vectors satisfy the extended exchangeability property if the equality in distribution holds for whatever as divers above.

Definition 3.2 .

We say that is a multi-knockoff vector of (or that they are multi-knockoffs of ) if the joint vector satisfies extended exchangeability and the conditional independence requirement .

We will subsequently on requite examples of how to generate such multi-knockoffs. We state a lemma that is a direct generalization of Lemma iii.ii in Candès et al. (2018) and requite a proof in AppendixB.one.

Lemma 3.1 .

Consider a subset of nulls . Define a generalized swap every bit above, where is the identity permutation whenever , and otherwise can exist any permutation. Then we accept the following equality in distribution for a multi-knockoff:

One time the multi-knockoff vector is sampled, consider the joint vector which takes values in . As for the unmarried knockoff setting, nosotros construct importance scores where each is a -dimensional vector. The importance scores are associated to the features in the following sense: if we generate importance scores on a swapped joint vector (as in Definitioniii.ane), then nosotros obtain the aforementioned outcome as if we had swapped the importance scores of the initial joint vector. That is, the function defining the importance scores must satisfy . Common examples of such constructions are the absolute values of the coefficients associated to each feature when regressing on , with eventually a penalty for sparsity. Denoting an ordered sequence by indexing in parenthesis (i.due east. for whatever real-valued sequence , we have ), we can define feature-wise ordered importance scores for each feature . For all , define:

Nosotros no longer have the possibility of generating feature statistics by taking an antisymmetric function of the importance scores. The extension to multi-knockoffs is done through these newly divers variables past noticing the analogy that if (single knockoff), and so and corresponds to the sign of ( if and just if ). In the unmarried knockoff setting, the crucial distributional result is that, conditionally on , the signs of the nada are i.i.d. flip coins. In the multi-knockoff case, the information encoded by the sign of is contained in , which indicates whether among a given dimension the original feature has a college importance score than that of its knockoffs. In AppendixB.4 we provide a geometric explanation for such choices of . The crucial outcome is that null bear uniformly and independently in distribution and can be used to estimate the number of false discoveries.

Lemma 3.2 .

The random variables are i.i.d. distributed uniformly on the set , and independent of the remaining variables , and of the feature-wise ordered importance scores . In item, conditionally on the variables and , the random variables are i.i.d. distributed uniformly on .

Nosotros prove this lemma in AppendixB.2. Following the steps that build the knockoff process as a detail case of the SeqStep+ process, nosotros construct the following threshold that defines the rejection set of our multi-knockoff procedure based on a FDP guess .

Input : Concatenated vector of importance scores, target FDR level

Output : Set of selected features

ane for to do

3 end for

return

Algorithm one Multi-knockoff Selection Procedure

Essentially, the multi-knockoff procedure returns the features where the original feature has higher importance score than whatsoever knockoffs ( ), and the gap with the 2nd largest importance score is in a higher place some threshold.

Proposition 3.3 .

Fix a target FDR level . The procedure that selects the features in the set given past Algorithm1 controls FDR at level .

Nosotros bear witness this result in AppendixB.3. 1 reward of the multi-knockoff selection procedure lies on the new value of the beginning parameter. By averaging over the multi-knockoffs, nosotros are able to subtract the threshold of minimum number of rejections from to , leading to an improvement in power and stability. We call the detection threshold of the multi-knockoff. We experimentally ostend such results in Section4.

3.two Gaussian Multi-knockoffs Based on Entropy Maximization

Well-nigh of the inquiry and applications have focused around generating standard knockoffs when

comes from a multivariate Gaussian distribution

, although more than universal sampling algorithms be, for which we provide in AppendixA a generalization to multi-knockoffs. Hither we extend the existing procedures for Gaussian knockoffs to generate Gaussian multi-knockoffs for . A sufficient condition for to be a multi-knockoff vector is for to be jointly Gaussian such that: 1) all the has the same hateful ; and 2) the covariance matrix has the form :

where is a diagonal matrix chosen so that is positive semi-definite to ensure that it is a valid covariance.

Proposition 3.4 .

If has the hateful and covariance structure given in a higher place, then is a valid multi-knockoff of .

If a diagonal term is zero, then for . This generates a valid multi-knockoff simply it has no power to discover the thursday feature (regardless of whether it is null or non-null) since each multi-knockoff is indistinguishable from the original characteristic. The general intuition is that the more than independent the knockoffs are from the original , the greater the power of discovering the non-zip features (Candès et al., 2018). Therefore previous work for the standard single knockoff (corresponding to ) has focused on finding as large as possible in some sense, while maintaining the positive semi-definiteness of the covariance matrix.

To construct for Gaussian multi-knockoffs, we advise maximizing the entropy (which has a simple closed class for Gaussian distributions). This is equivalent in the single knockoff example to minimizing common information, equally suggested in Candès et al. (2018). Indeed, , and exercise not depend on , hence the equivalence.

Entropy Knockoffs The diagonal matrix for constructing entropy multi-knockoffs is given by the following convex optimization problem:

This optimization problem is a convex optimization problem, past noticing that is convex. It tin can be solved efficiently and our implementation is based on the Python package CVXOPT (M. S. Andersen and Vandenberghe, 2012). This knockoff construction method avoids solutions where diagonal terms are extremely close to , and nosotros provide the post-obit lower bound on the diagonal terms of :

where

is the smallest (positive) eigenvalue of

. The fact that we maximize the value of the determinant of such matrix implies that we avoid having whatever extremely small eigenvalue, hence this bound proves useful. We provide additional analysis on the formulation of entropy maximization equally a convex optimization problem and bear witness this lower bound in AppendixA. Once the diagonal matrix is computed, we can generate the Gaussian multi-knockoffs past writing the provisional distribution given the original features .

Effigy 1: Comparison between methods for generating Gaussian knockoffs We plot the densities of the distributions of the diagonal terms generated by each method. The dimension of the covariance matrix is threescore.

In the single knockoff setting ( ), the standard approach in literature is to solve a semidefinite program (SDP) to optimize . An alternative approach in the literature, chosen equicorrelation, is to restrict to be and solve for (where we consider as a correlation matrix, the goal existence having the same correlation betwixt original features and knockoffs in every dimension). We provide natural generalizations of the SDP and equicorrelation to optimize the matrix for multi-knockoffs (see AppendixA). The SDP knockoffs are based on an optimization problem that promotes sparsity: the fact that the objective function is a

-distance between the identity matrix and the diagonal matrix

implies that many diagonal terms of the optimal solution will be set well-nigh equal to 0. In addition, Candès et al. (2018) noticed that the equicorrelated knockoff method tends to accept very low power, as in loftier dimensions the diagonal terms of are proportional to the everyman eigenvalue of the covariance matrix , which in high-dimensional settings tends to exist extremely small-scale. Currently, SDP knockoffs are chosen by default.

We perform experiments to demonstrate the reward of entropy over SDP and equicorrelation. Nosotros randomly generate correlation matrices with the role make_spd_matrix from the Python package scikit-acquire (Pedregosa et al., 2011), and compute the diagonal matrix with the SDP and equicorrelated methods (the diagonal terms of are necessarily in the interval and then that nosotros can compare them across several runs). The lower a diagonal term is, the higher the correlation is betwixt the original feature and its corresponding knockoff, and the less powerful is the knockoff. In Figure1, we plot the density of the distribution of the logarithm of the diagonal terms of (that we judge with the empirical distribution based on 50 runs).

Nosotros see that a significant proportion of diagonal terms based on the SDP construction accept values extremely small, several orders of magnitude smaller than , which finer deport equally whenever nosotros sample knockoffs and thus the corresponding features are essentially undiscoverable because their knockoffs are too similar. In AppendixC.1 we prove more such comparisons of the distribution of the diagonal terms for varying dimensions of the correlation matrices and strength of the correlation. More apropos in the SDP construction example, the ready of almost-nix diagonal terms is very unstable to perturbations in the correlation matrix. We report in AppendixC.2 the simulations proving the instability of such sets. The upshot is uncomplicated: the Jaccard similarity between two sets of SDP undiscoverable features generated from ii empirical covariance matrices obtained from 2 batches of i.i.d. samples is on boilerplate very depression. That is, two parallel runs of the knockoff process on different datasets coming from the verbal same original distribution pb to different sets of undiscoverable features.

The equicorrelated construction does not suffer such upshot, although the diagonal terms tend to be smaller compared to the SDP diagonal terms that are not almost . The SDP construction, due to its objective function, maximizes some diagonal terms at the expense of many others that are effectively set to 0, whereas the equicorrelated construction treats all coordinates more than equally.

Finally, the entropy structure achieves the best functioning: the diagonal terms it constructs are by and large a couple of orders of magnitude higher than the equicorrelated method, and when comparing to SDP, the entropy structure does not generate almost-zero terms, then that information technology does not create any catastrophic knockoff. This means that entropy knockoffs will by and large accept college ability than those from the other methods, and that on top of that the whole procedure volition be more stable: the pick set will non vary from one run to another because the set of undiscoverable features changes completely.

4 Experiments

Nosotros starting time conduct systematic experiments on synthetic data, so that nosotros know the ground truth. For each experiment, we evaluate both the ability and the stability of multi-knockoffs and the standard knockoff. Then nosotros evaluate the performance of knockoff on a real prepare from Genome Wide Association Studies (GWAS).

4.i Analyzing Improvements with Synthetic Data

We run simulations with synthetic information to ostend the threshold miracle and the improvements brought by multi-knockoffs. We randomly generate a feature matrix from a random covariance matrix, fix a number of non-nulls and create a binary response Y based on a logistic response of a weighted linear combination of the non-naught features. Then, we sample multi-knockoffs with (single knockoff), and from that same

and run the knockoff process based on a logistic regression to obtain a option ready, along with values for the ability and an FDP. We so repeat this whole procedure 50 times to obtain estimates of the variance and become an empirical FDR. Knockoffs are generated based on the entropy structure to prove that our multi-knockoff based comeback is made on summit of the entropy improvement (which is only specific for Gaussian knockoffs).

Figure 2: Power and FDR comparison between single knockoffs and multi-knockoffs 2 and three multi-knockoffs has greater power than the standard knockoff when the number of not-nulls is small-scale. All three methods control FDR.

Figure 3: Improvement in stability with multi-knockoffs: density of non-nulls by selection frequency When has low correlation setting (upper figure), and when has high correlation (lower figure). The x-axis of each plot is the frequency that a non-nix feature is selected, and the y axis indicates density.

Nosotros set our target FDR level at , and compare the single knockoff setting with multi-knockoffs (with ) over a range of number of non-null features. Nosotros written report our results in Figure2. We outset point out that FDR is strongly controlled in all the experiments, as expected. We then estimate the threshold values for detection given by the estimates : for single knockoffs ( ), rejections for multi-knockoffs with , and whenever . By plotting power as a function of the number of not-nulls we clearly confirm this threshold beliefs. All iii settings attain a high ability regime whenever the number of non-nulls exceeds the expected detection threshold. This shows the reward of using multi-knockoffs in settings where nosotros expect a priori the number of not-nulls to be small, and want to make sure that our method has a risk of selecting such small gear up of non-nulls. We also come across in that location is a small price to pay for using multi-knockoffs. Whenever the number of non-null features increases so that we are beyond the detection thresholds, power decreases with the number of multi-knockoffs. This is due to the fact that sampling multi-knockoffs imposes a more stringent constraint to construct the knockoff conditional distribution (cf. AppendixA), and therefore multi-knockoffs tin can accept slightly "worse" power every bit increases.

Finally, multi-knockoffs not but substantially improve the power of the procedure in settings with a small number of non-nulls; they as well assistance stabilizing the procedure. We plot in Effigy3, as a function of the selection frequency, the density of the distribution of the non-nulls. In order to become the selection frequency, we run the same procedure as before, except this time we sample 200 (multi-)knockoffs out of one same and run the procedure each time keeping that same and the response we had generated. This allows us to compute how frequently each non-null is selected (by repeatedly sampling knockoffs from a same , FDR is no longer controlled. The signal of these simulations is to stress the improvement in stability). For different settings where we vary the number of non-nulls, we see that the multi-knockoffs consistently reject a large fraction of the non-nulls whenever the threshold of detection is attained. In contrast, most non-zero features are selected by the standard knockoff at low frequency, indicating instability.

The key attribute here is that the improvement in power whenever a threshold is crossed is non considering of an overall increase in selection frequency of all the non-nulls: the densities in the above figures do non concentrate around intermediate selection frequency values. That is, multi-knockoffs do non increase the power past increasing instability.

iv.2 Applications: GWAS Causal Variants

Figure 4: Power and FDR comparison between single knockoffs, multi-knockoffs, and pinnacle correlation for a GWAS dataset.

We apply our stabilizing procedures for fine mapping the causal variants in a genome wide association study (GWAS). These studies scan the whole genome in search of single nucleotide polymorphisms (SNPs) that are associated with a particular phenotype. In practice, they compute correlation scores for each SNP with respect to the phenotype, and select those beyond a certain significance threshold. Oft times, the high correlation betwixt SNPs (chosen linkage disequilibrium) implies that a large number of consecutive SNPs have a big association score and thus are selected. Fine mapping consists in finding the precise causal SNPs that really help explain the phenotype. Knockoffs can exist useful in this setting, but the threshold miracle described earlier is an impediment to the application of knockoffs. We desire to analyze several dozens, maybe hundreds of SNPs that accept passed the selection threshold of the GWAS. However, the number of truthful causal SNPs may be very depression, possibly less than 10. If we set a target FDR level of , the single knockoff procedure may be unable to make whatever detection.

We follow the lines of Hormozdiari et al. (2016, 2014) and run simulations coordinating to those presented in Effigy2, where the features at present correspond to private genotypes. As it is not possible to actually know, for a given phenotype, which are the true causal SNPs without experimental confirmation, we generate synthetic responses (phenotypes) by randomly choosing a given number of SNPs as causal. Such semi-synthetic data (real and simulated ) is standard in literature Hormozdiari et al. (2014). In addition, nosotros run a selection process without statistical guarantees that is commonly used: we pick the superlative correlated SNPs with the response. We requite more details in AppendixC.iii. We recover the results obtained with synthetic data and study them in Figure4: FDR is controlled with the multi-knockoff procedure, and the top correlation method fails to control FDR. We also find the detection threshold result: for a low number of causal SNPs, single knockoffs have near no ability, and multi-knockoffs have meliorate ability than picking the most correlated SNPs. Note that here nosotros apply the real SNPs data from 2000 individuals, and we are approximating the SNP covariance matrix past a Gaussian in order to utilise multi-knockoffs (SNPs take value in {0,1,2}). The multi-knockoff is robust plenty that FDR is controlled fifty-fifty with this model mismatch.

v Discussion

In this paper, we propose multi-knockoffs, an extension of the standard knockoff procedure. Nosotros show that multi-knockoff guarantees FDR command, and demonstrate how to generate Gaussian multi-knockoffs via a new entropy-maximization algorithm. Our extensive experiments show that multi-knockoffs are more stable and more powerful compared to the standard (single) knockoff. Finally nosotros illustrate on the important trouble of identifying causal GWAS mutations that multi-knockoff substantially outperforms the pop approach of selecting mutations with the highest correlation with the phenotype. The main contribution of this paper is in proposing the mathematical framework of multi-knockoffs; additional empirical analysis and applications is an of import direction of future work.

Acknowledgments

J.R.1000. was supported past a Stanford Graduate Fellowship. J.Z. is supported by a Chan–Zuckerberg Biohub Investigator grant and National Science Foundation (NSF) Grant CRII 1657155. The authors thank Emmanuel Candès and Nikolaos Ignatiadis for helpful discussions.

References

Tibshirani (1996) Robert Tibshirani. Regression shrinkage and pick via the lasso. Journal of the Majestic Statistical Society. Serial B (Methodological), pages 267–288, 1996.
Mallows (1973) Colin 50 Mallows. Some comments on c p. Technometrics, 15(4):661–675, 1973.
Sur and Candès (2018) Pragya Sur and Emmanuel J Candès. A mod maximum-likelihood theory for loftier-dimensional logistic regression. arXiv preprint arXiv:1803.06964, 2018.
Candès et al. (2018) Emmanuel Candès, Yingying Fan, Lucas Janson, and Jinchi Lv. Panning for gold:'model-x'knockoffs for loftier dimensional controlled variable selection. Journal of the Royal Statistical Society: Serial B (Statistical Methodology), 2018.
Barber et al. (2018) Rina Foygel Barber, Emmanuel J Candès, and Richard J Samworth. Robust inference with knockoffs. arXiv preprint arXiv:1801.03896, 2018.
Barber et al. (2015) Rina Foygel Barber, Emmanuel J Candès, et al. Controlling the false discovery charge per unit via knockoffs. The Annals of Statistics, 43(5):2055–2085, 2015.
M. South. Andersen and Vandenberghe (2012) J. Dahl 1000. South. Andersen and Fifty. Vandenberghe. Cvxopt: A python package for convex optimization, version 1.i.five. 2012.
Pedregosa et al. (2011) F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, Thousand. Blondel, P. Prettenhofer, R. Weiss, Five. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, Yard. Perrot, and Due east. Duchesnay. Scikit-learn: Machine learning in Python. Periodical of Machine Learning Research, 12:2825–2830, 2011.
Hormozdiari et al. (2016) Farhad Hormozdiari, Martijn van de Bunt, Ayellet V Segrè, Xiao Li, Jong Wha J Joo, Michael Bilow, Jae Hoon Sul, Sriram Sankararaman, Bogdan Pasaniuc, and Eleazar Eskin. Colocalization of gwas and eqtl signals detects target genes. The American Journal of Human Genetics, 99(6):1245–1260, 2016.
Hormozdiari et al. (2014) Farhad Hormozdiari, Emrah Kostem, Eun Yong Kang, Bogdan Pasaniuc, and Eleazar Eskin. Identifying causal variants at loci with multiple signals of association. Genetics, pages genetics–114, 2014.
Lei and Fithian (2018) Lihua Lei and William Fithian. Adapt: an interactive procedure for multiple testing with side information. Journal of the Purple Statistical Society: Serial B (Statistical Methodology), 80(four):649–679, 2018.
Lei et al. (2017) Lihua Lei, Aaditya Ramdas, and William Fithian. Star: A full general interactive framework for fdr control under structural constraints. arXiv preprint arXiv:1710.02776, 2017.
Ignatiadis et al. (2016) Nikolaos Ignatiadis, Bernd Klaus, Judith B Zaugg, and Wolfgang Huber. Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nature methods, 13(7):577, 2016.
Consortium et al. (2015) 1000 Genomes Project Consortium et al. A global reference for man genetic variation. Nature, 526(7571):68, 2015.

References

Tibshirani (1996) Robert Tibshirani. Regression shrinkage and option via the lasso. Journal of the Purple Statistical Society. Series B (Methodological), pages 267–288, 1996.
Mallows (1973) Colin L Mallows. Some comments on c p. Technometrics, 15(four):661–675, 1973.
Sur and Candès (2018) Pragya Sur and Emmanuel J Candès. A modern maximum-likelihood theory for high-dimensional logistic regression. arXiv preprint arXiv:1803.06964, 2018.
Candès et al. (2018) Emmanuel Candès, Yingying Fan, Lucas Janson, and Jinchi Lv. Panning for gold:'model-x'knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2018.
Barber et al. (2018) Rina Foygel Hairdresser, Emmanuel J Candès, and Richard J Samworth. Robust inference with knockoffs. arXiv preprint arXiv:1801.03896, 2018.
Barber et al. (2015) Rina Foygel Barber, Emmanuel J Candès, et al. Controlling the faux discovery charge per unit via knockoffs. The Register of Statistics, 43(5):2055–2085, 2015.
M. Southward. Andersen and Vandenberghe (2012) J. Dahl Yard. S. Andersen and L. Vandenberghe. Cvxopt: A python bundle for convex optimization, version 1.1.v. 2012.
Pedregosa et al. (2011) F. Pedregosa, 1000. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, Thousand. Perrot, and E. Duchesnay. Scikit-acquire: Automobile learning in Python. Journal of Auto Learning Inquiry, 12:2825–2830, 2011.
Hormozdiari et al. (2016) Farhad Hormozdiari, Martijn van de Bunt, Ayellet V Segrè, Xiao Li, Jong Wha J Joo, Michael Bilow, Jae Hoon Sul, Sriram Sankararaman, Bogdan Pasaniuc, and Eleazar Eskin. Colocalization of gwas and eqtl signals detects target genes. The American Journal of Human Genetics, 99(6):1245–1260, 2016.
Hormozdiari et al. (2014) Farhad Hormozdiari, Emrah Kostem, Eun Yong Kang, Bogdan Pasaniuc, and Eleazar Eskin. Identifying causal variants at loci with multiple signals of association. Genetics, pages genetics–114, 2014.
Lei and Fithian (2018) Lihua Lei and William Fithian. Adapt: an interactive procedure for multiple testing with side information. Journal of the Majestic Statistical Gild: Serial B (Statistical Methodology), 80(iv):649–679, 2018.
Lei et al. (2017) Lihua Lei, Aaditya Ramdas, and William Fithian. Star: A general interactive framework for fdr control nether structural constraints. arXiv preprint arXiv:1710.02776, 2017.
Ignatiadis et al. (2016) Nikolaos Ignatiadis, Bernd Klaus, Judith B Zaugg, and Wolfgang Huber. Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nature methods, thirteen(vii):577, 2016.
Consortium et al. (2015) thousand Genomes Project Consortium et al. A global reference for human genetic variation. Nature, 526(7571):68, 2015.

Appendix

Appendix A Sampling Multiple Knockoffs

a.1 Gaussian Multi-knockoffs

Nosotros generalize the knockoff generation procedure to have multi-knockoffs, starting with the Gaussian case. We see that a sufficient status for to be a multi-knockoff vector -besides all vectors having the same mean - is that the articulation vector has a covariance matrix of the form:

We can hands generalize previous diagonal matrix constructions to the multi-knockoff setting. The mathematical formulation of the heuristic backside SDP and equicorrelated knockoffs -as an objective function in the convex optimization problem- does not alter when sampling multi-knockoffs, as the correlation between an original feature and whatever of its multi-knockoffs is the same as a event of exchangeability. However, the positive semi-definite constraint that defines the feasible set changes with

. For the Entropy knockoffs, the objective part depends also on .

Suggestion A.1 .

We generalize the diagonal construction methods SDP, equicorrelated and entropy when sampling multi-knockoffs from a multivariate Gaussian, past the following convex optimization problems. We recover the formulations for the unmarried knockoff setting by replacing .

SDP Multi-knockoffs For a covariance matrix whose diagonal entries are equal to ane, the diagonal matrix for constructing SDP knockoffs is given by the following convex optimization problem:
Equicorrelated Multi-knockoffs For a covariance matrix whose diagonal entries are equal to one, the diagonal matrix for constructing equicorrelated knockoffs is given by the following convex optimization problem:

The solution of this optimization trouble has a closed form expression: , where is the smallest (positive) eigenvalue of .
Entropy Multi-knockoffs The diagonal matrix for constructing entropy knockoffs is given by the following convex optimization trouble (as is convex):

The entropy knockoff construction method avoids solutions where diagonal terms are extremely close to , and nosotros provide the following lower bound on the diagonal terms of :

where is the smallest (positive) eigenvalue of .

For the SDP method and the equicorrelated method, increasing the number of multi-knockoffs constrains the feasible set of the convex optimization problem. However, diagonal terms tin always be as close to 0 as they want, and we empirically discover a slight decrease in power as nosotros increment indicating that the added constraints limit the pick of "expert" values for the diagonal terms.

Proof.

The heuristic behind the different structure methods looks for different optimal solutions to convex optimization issues. Depending on the multi-knockoff parameter , we need to adapt 2 parts of the convex optimization formulations: the objective function and the viable gear up. Objective functions in the SDP and equicorrelated constructions remain unchanged every bit they practise non depend on the number of multi-knockoffs.

Adapting the Viable Set

We start look at how the constraints defining the viable set change as we get from simple knockoffs to multi-knockoffs. All three methods (SDP, equicorrelated, entropy) define the feasible set for by constraining to be positive definite. Nosotros evidence that this constraint is equivalent to , which we prove past consecration. Suppose that at step , for any positive definite matrix , for positive definite diagonal matrix,

where we write for symmetric positive definite. We have :

Hence the recursive pace and we conclude the proof. We take

given that and then is the Schur complement of :

Objective Office for Entropy Construction

In addition to this, we demand to formulate the objective function for the entropy structure. The entropy of a multivariate Gaussian has a simple closed formula.

Nosotros rearrange the expression of to bear witness that minimizing is equivalent to minimizing

(Nosotros showed in the principal text that minimizing the entropy in a Gaussian setting is equivalent to minimizing this log-determinant). In club to practice so, it suffices to evidence past induction that the following holds for all :

where the multiplicative constant is a real number depending only on . We outset show this for .

Suppose the issue holds for a given . We use the notation . Nosotros accept:

Hence the result, where is the same as earlier. We used the following ii formulae to compute determinants of cake matrices:

If and commute and all the blocks are square matrices, so

Lower Spring for Diagonal Terms in Entropy Construction

For the entropy structure, in order to give a lower leap for the , we derive an expression for the solution of the minimization problem. Without loss of generality, prepare and then that we compute the partial derivative with respect to . Denote . Using Jacobi's formula for the derivative of a determinant, we get:

given that where is a matrix where the but not-zero term equal to ane is in position . Therefore . Setting this expression to we get that the solution of the convex optimization problem satisfies

Now we can write the diagonal term in the inverse matrix as a caliber between ii determinants: where is the principal modest of when removing the thursday row and column. As both and can be written equally a production of eigenvalues, the Cauchy interlacing theorem gives the following lower spring:

where is the smallest (positive) eigenvalue of . ∎

a.2 General Multi-knockoff Sampling Based on SCIP

We can also generalize to the multi-knockoff setting a universal (although possibly intractable) knockoff sampling algorithm introduced in Candès et al. (2018): the Sequential Conditional Independent Pairs (SCIP). Fix the number of multi-knockoffs to sample (so that SCIP corresponds to ). Nosotros iterate for over the features, at each step sampling knockoffs for the th feature, independently 1 of some other, from the conditional distribution of the original feature given all the available variables sampled so far. We formulate this in Algorithm2 and prove that the resulting samples satisfy exchangeability.

ane for do

2 for do

3 Sample

5 stop for

seven end for

return

Algorithm ii Sequential Conditional Independent Multi-knockoffs

Proof.

We demand to show the post-obit equality in distribution, using the notations of Definitioniii.1:

We follow the aforementioned proof every bit in Candès et al. (2018), where nosotros have the following induction hypothesis:

Induction Hypothesis:

Later i steps, we have

where now with arbitrary permutations over . After the beginning step the equality holds for given that all accept the same conditional distribution and are independent one of another. Now, if the hypothesis holds at step , then at stride we have that the articulation distribution of can exist decomposed as a product of provisional distributions given the sampling procedure so that we accept:

At present, by induction hypothesis, the expression in the denominator satisfies the extended exchangeability for the start dimensions (we marginalize out over which doesn't matter as at stride the permutations are over ). And then are the terms in the numerator, as again nosotros permute just elements amongst the first dimensions. And, considering of the provisional independent sampling, the numerator expression is also exchangeable for the thursday dimension. In decision, is exchangeable for the outset dimensions, hence concluding the proof. ∎

Appendix B Proofs

b.one Proof of Lemma3.1

Proof.

Given that the operation is the concatenation of the action of each permutation onto and that nosotros can write as the composition of transpositions, we see that it is enough to show the result for a elementary transposition of two features (original or multi-knockoff) corresponding to a nada dimension. This leads us directly to the proof of Lemma 3.2 in Candès et al. (2018), where the difference is that we add all the actress multi-knockoffs in the conditioning fix. ∎

b.2 Proof of Lemma3.ii

Proof.

Consider whatever collection of permutations on the set up , and for , set the identity permutation. In order to prove the consequence we demand to show the following equality in distribution:

Define for every and . Using the annotation for the extended swap this is equivalent to , where for each cypher alphabetize the th features of and its knockoffs accept been permuted according to (and the non-zilch remained at their place). By construction, is a function of and which associates to each characteristic in a "score" for its importance (for simplicity hither we will announce past the whole concatenated vector of ). The option of such function is restricted and then that . By the multi-knockoff exchangeability property, and our specific selection of that does not permute non-null features, nosotros as well have . This in plough implies:

Too, given that the permutation is washed feature-wise, the feature-wise ordered importance scores remain the same.

We at present prove the equality in distribution (where we accept an abusive note for representing set up probabilities):

The 2nd equality is due to the equality in distribution between and , and the 3rd equality makes use of the fact that for any the society statistics of and are the same. The statement about our variables holds because they are functions of the feature-wise ordered importance scores. ∎

b.iii Proof of Propositioniii.3

Proof.

The random variables let us to construct ane-bit p-values as in Barber et al. (2015), while the can be used to determine the ordering in which we sort those p-values, given that conditionally on , we have i.i.d. compatible over , independent of . We can therefore permute the dimension indices based on so that , and still define the following random variables with the desired properties. Nosotros wait that our ordering based on volition tend to place not-nulls at the beginning. Set up for :

The distributional results for imply that the naught are also i.i.d., independent of the non-null and the and accept the following distribution:

In particular, nix satisfy . Fix a target FDR level , and a constant . Following Barber et al. (2015), define the Selective SeqStep+ threshold:

Then co-ordinate to Theorem 3 in Barber et al. (2015), the procedure that selects the features , controls for FDR at level . For the particular option of , we take:

Now, instead of maximizing over indexing a decreasing sequence , one can formulate the problem equally minimizing the threshold :

The pick set is then defined as:

∎

Nosotros observe that the master role of is to determine an ordering sequence of the p-values for the Adaptive SeqStep+ process. Any function of the ordered statistics gives valid statistics that can be used to order the p-values, given that the distributional restrictions will notwithstanding be satisfied. A rich literature covers this topic (Lei and Fithian, 2018; Lei et al., 2017; Ignatiadis et al., 2016), and could be applied to multi-knockoff based p-values.

b.4 Intuition for Choice of Kappa and Tau

We illustrate the item pick of and from a geometric point of view. For the single knockoffs, one can pair the importance statistics of each original feature and its knockoff and plot such pairs as points in a aeroplane . We so have a geometric view of the threshold selection. Consider the parallel lines given by the equations and , partition the plane into 3 sections. The terms and in the FDP estimate

are obtained by counting the number of points in the section above (that is, ) and below (that is, ). For , the two lines collapse and is partitioned by the line .

The same setting tin happen in college dimensions, where we sectionalization the space into cones given by . Our method for choosing a threshold for multi-knockoffs gain equally before: for a given , nosotros count the number of points in each translated cone and compare the counts in corresponding to the original characteristic to the average over those in . We then discover the minimum subject to some constraint. Reformulating this gives our variables and .

Appendix C Supplement on Simulations

c.1 Comparison Between Distributions of Diagonal Construction Methods

We run another simulation where nosotros increase the dimension of the samples. We plot again the distribution of the logarithm of the diagonal terms for the three construction methods in Figure5. As we increment the dimension, we observe that the distributions are shifted towards more negative values, indicating that the diagonal coefficients synthetic tend to be smaller. This is especially the case for the equicorrelated structure. The SDP construction generates an even higher proportion of most-zippo diagonal terms as we increase the dimension. Likewise, increasing the level of correlation has also an touch on on the distribution of the diagonal terms similar to what nosotros observe by increasing the dimension.

Figure 5: Comparison Between Diagonal Matrix Structure Methods - Increased Dimension to 400

c.2 Measuring Stability of the Ready of SDP-based Undiscoverable Features with Jaccard Similarity

For a given correlation matrix, we generate samples from a centered multivariate Gaussian. Based on the estimated correlation matrix from these samples, we run the SDP construction to get the matrix , and identify the set of undiscoverable features. By repeatedly doing this, we obtain multiple sets of undiscoverable features. In Figure6 we plot the averaged Jaccard similarity over all pairs of such sets, every bit a part of the sample size (and repeat the whole procedure 50 times to approximate the variance of our results). Even though the similarity increases with the sample size, it remains very low. Furthermore, the similarity decreases with the dimension, so in loftier-dimensional problems where and so the SDP construction method is very unstable, and has a very high proportion of undiscoverable features equally suggested in Figurev. Reproducing findings becomes then very difficult in such settings if we use SDP knockoffs.

Figure 6: Average Pairwise Jaccard Similarity for Multiple Runs of SDP Method

c.3 Generating the Synthetic Response for the Real Genome Dataset

Nosotros collect data from the 1000 Genomes Project (Consortium et al., 2015), and obtain effectually 2000 individual samples for 27 distinct segments of chromosome 19 containing an average of 50 SNPs per segment. We filter out SNPs that are extremely correlated (above 0.95), and generate for each of those 27 segments a random subset that will represent to the causal SNPs. We then generate the response appropriately and use a logistic regression to obtain importance scores. For the top correlation method, nosotros select the top correlated features, where is chosen as the number of rejections that multi-knockoffs brand, and so that we have a fair comparing between methods.

1 important caveat that explains why sometimes the averaged FDP is in a higher place the target is that with real data, it is crucial to accurately guess the feature distribution. In these simulations, we approximate the 0/1/2 matrix of SNPs past a Gaussian distribution, where we need to estimate the covariance based on the data. Such inaccurate approximation causes the average FDP to exceed the target sometimes. However, the knockoff procedure is robust to mis-estimations of the feature distribution (Barber et al., 2018), so that we tin can wait FDR command at an inflated level. Our FDR results are therefore satisfactory, and the comparison is stark with the pinnacle correlation method that catastrophically fails to control FDR.