Remove rows with low variance and impute missing values

adjust_matrix(m, sd_quantile = 0.05, max_na = 0.25, verbose = TRUE)

Arguments

m

A numeric matrix.

sd_quantile

Cutoff of the quantile of standard deviation. Rows with standard deviation less than it are removed.

max_na

Maximum NA fraction in each row. Rows with NA fraction larger than it are removed.

verbose

Whether to print messages.

Details

The function uses impute.knn to impute missing values, then uses adjust_outlier to adjust outliers and removes rows with low standard deviations.

Value

A numeric matrix.

Author

Zuguang Gu <z.gu@dkfz.de>

Examples

set.seed(123)
m = matrix(rnorm(100), nrow = 10)
m[sample(length(m), 5)] = NA
m[1, ] = 0
m
#>              [,1]        [,2]       [,3]       [,4]        [,5]       [,6]
#>  [1,]  0.00000000  0.00000000  0.0000000  0.0000000  0.00000000  0.0000000
#>  [2,]  0.70610908 -0.01260453  0.3207204         NA  1.71798542 -1.4614942
#>  [3,]  1.48902132  0.17625437 -0.7470537         NA -0.32774780  0.3494173
#>  [4,] -1.81509255 -1.58368027 -0.2179812  1.0140337  0.39628362  1.0456287
#>  [5,]  0.33040958  0.46779179 -0.5137169  1.5790067          NA -0.6271371
#>  [6,] -1.14215571  1.19461175  0.6018131  0.6687560  0.01330582  0.3957078
#>  [7,]  0.15719342  0.77458193 -1.5498219 -0.3535221 -0.63898639 -1.2075114
#>  [8,] -2.06540724  0.08710445 -1.7096228 -1.4058683  2.24830001  0.9457638
#>  [9,] -0.44054688          NA  0.7701488 -2.4582271  0.06632788         NA
#> [10,]  0.00395328  1.21997897 -0.7168730  2.8895694  0.03159166 -0.2721696
#>               [,7]        [,8]        [,9]        [,10]
#>  [1,]  0.000000000  0.00000000  0.00000000  0.000000000
#>  [2,] -1.360684436 -1.23465746 -1.56740996  0.008486843
#>  [3,] -0.320849537  0.04221836 -0.61225871  0.773146260
#>  [4,] -1.123577619 -0.79198362 -0.29797771 -1.151920752
#>  [5,]  1.052020871 -0.38886174  0.34063680  0.862577468
#>  [6,] -1.036255798 -0.74227068  0.13373072  0.566374004
#>  [7,]  1.114446832  0.77322966  0.86266186 -0.653870279
#>  [8,] -0.530456611  0.68250298  0.05063779  0.075560593
#>  [9,]  0.001133325 -0.21793606  1.22458533  0.557868206
#> [10,] -1.231623777 -0.63878196  0.01722424 -1.098786692
m2 = adjust_matrix(m)
#> There are NA values in the data, now impute missing data.
#> 1 rows have been removed with zero variance.
#> 1 rows have been removed with too low variance (sd <= 0.05 quantile)
m2
#>             [,1]        [,2]       [,3]       [,4]        [,5]        [,6]
#> [1,]  0.70610908 -0.01260453  0.3207204  0.1352084  1.26264107 -1.46149424
#> [2,] -1.71095702 -1.58368027 -0.2179812  1.0140337  0.39628362  1.03141096
#> [3,]  0.33040958  0.46779179 -0.5137169  1.3418631  0.63966296 -0.57609802
#> [4,] -1.09450075  0.95797664  0.6018131  0.6687560  0.01330582  0.39570778
#> [5,]  0.15719342  0.77458193 -1.3957821 -0.3535221 -0.63898639 -1.20751138
#> [6,] -1.90530426  0.08710445 -1.7096228 -1.4058683  1.66215873  0.94576382
#> [7,] -0.44054688  0.23279917  0.7701488 -1.5502710  0.06632788 -0.08002031
#> [8,]  0.00395328  1.21997897 -0.7168730  2.1382537  0.03159166 -0.27216958
#>              [,7]       [,8]        [,9]        [,10]
#> [1,] -1.360684436 -1.2346575 -1.51974788  0.008486843
#> [2,] -1.123577619 -0.7919836 -0.29797771 -1.151920752
#> [3,]  1.052020871 -0.3888617  0.34063680  0.862577468
#> [4,] -1.036255798 -0.7422707  0.13373072  0.566374004
#> [5,]  1.001143595  0.7732297  0.86266186 -0.653870279
#> [6,] -0.530456611  0.6825030  0.05063779  0.075560593
#> [7,]  0.001133325 -0.2179361  1.02008889  0.557868206
#> [8,] -1.171847089 -0.6387820  0.01722424 -1.098786692