[JAVA] [100%] [GenomicRangeQuery] (src & junit & author's note)

Respectable

GenomicRangeQuery

Find the smallest nucleotide in a series of sequence DNAs.

Task description

The DNA sequence can be represented as a string consisting of the letters A, C, G, and T. These correspond to the types of contiguous nucleotides in the sequence. Each nucleotide has an impact factor that is an integer. The impact factors for nucleotides of types A, C, G, and T are 1, 2, 3, and 4, respectively. You are trying to answer some questions of the form: What is the minimum impact factor of nucleotides contained in a particular part of a given DNA sequence?

The DNA sequence is given as a non-empty string of N characters S = S [0] S [1] ... S [N-1]. There are M queries, each consisting of M integers, specified by the non-empty arrays P and Q. The Kth query (0 ≤ K <M) needs to find the minimum impact factor of the nucleotides contained in the DNA sequence between positions P [K] and Q [K].

For example, consider the string S = CAGCCTA and the arrays P, Q as follows:

    P [0] = 2 Q [0] = 4
    P [1] = 5 Q [1] = 5
    P [2] = 0 Q [2] = 6

The answer to these M = 3 queries is:

The part between positions 2 and 4 of DNA contains nucleotides G and C (twice). The answer is 2 because these impact factors are 3 and 2, respectively. The answer is 4 because the part between the 5th and 5th positions contains a single nucleotide T with an impact factor of 4. The answer is 1 because the part between positions 0 and 6 (the entire string) contains all the nucleotides, especially the nucleotide A with an impact factor of 1.

Write a function:

class solution {public int [] solution(String S, int [] P, int [] Q); }

This is an M number that specifies consecutive answers to all queries given a non-empty string S consisting of N characters and two non-empty arrays P and Q consisting of M integers. Returns an array consisting of the integers of.

The resulting array should be returned as an array of integers.

For example, given the string S = CAGCCTA and the arrays P, Q:

    P [0] = 2 Q [0] = 4
    P [1] = 5 Q [1] = 5
    P [2] = 0 Q [2] = 6

As explained above, the function should return the value [2, 4, 1].

Write an efficient algorithm for the following assumptions:

N is an integer in the range [1..100,000].
M is an integer in the range [1..50,000].
Each element of the array P and Q is an integer in the range [0..N −1].
P [K] ≤ Q [K], where 0 ≤ K <M;
The string S consists of only uppercase letters A, C, G, and T.

Author's note

This is a "Prefix Sums" scenario. Since the range of values for each letter of S is [1,2,3,4], we need four "Prefix Sums" arrays, that is, int [4] [S.length ()]. And you can quickly find out the number of 1, 2, 3, 4 in any slice (O (4)).

This is an individual economic “pre-wa” application. S middle-sized individual character array 范围围围围围围围围围围围围围围围范 [1,4] This sample can be rapid (O (4)) Inclusive 1,2,3,4 intercept in the intercept of the intellectual way.

PrefixSums Matrix:

S	=	C	A	G	C	C	T	A
DNA		2	1	3	2	2	4	1
prefixSums	[0]	0	1	1	1	1	1	2
prefixSums	[1]	1	1	1	2	3	3	3
prefixSums	[2]	0	0	1	1	1	1	1
prefixSums	[3]	0	0	0	0	0	1	1

`Example`


P [0] = 2 Q [0] = 4

S	=	C	A	G	C	C	T	A
DNA		2	1	3	2	2	4	1
A	prefixSums[0]	0	1	1	1	1	1	2
C	prefixSums[1]	1	1	1	2	3	3	3
G	prefixSums[2]	0	0	1	1	1	1	1
T	prefixSums[3]	0	0	0	0	0	1	1

According to the table above:

prefixSums [0] [4 + 1] --prefixSums [0] [2]) == 0, that is, there is no "A" in the slice
prefixSums [1] [4 + 1] --prefixSums [1] [2])> 0, that is, there is a "C" in the slice That is, the minimum value for that slice is 2, and so on.

solve

Program GenomicRangeQuerySolution.java

`GenomicRangeQuerySolution.java`



    public int[] solution(String S, int[] P, int[] Q) {
        int[] goal = new int[P.length];
        int[][] prefixSums = new int[3][S.length() + 1];
        int[] DNA = new int[]{1, 2, 3, 4};
        short a, c, g;
        for (int i = 0; i < S.length(); i++) {
            a = 0;
            c = 0;
            g = 0;
            switch (S.charAt(i)) {
                case 'A':
                    a = 1;
                    break;
                case 'C':
                    c = 1;
                    break;
                case 'G':
                    g = 1;
                    break;
                default:
                    break;
            }
            prefixSums[0][i + 1] = prefixSums[0][i] + a;
            prefixSums[1][i + 1] = prefixSums[1][i] + c;
            prefixSums[2][i + 1] = prefixSums[2][i] + g;
        }

        for (int j = 0; j < P.length; j++) {
            int from = P[j];
            int to = Q[j] + 1;
            if (prefixSums[0][to] - prefixSums[0][from] > 0) {
                goal[j] = DNA[0];
            } else if (prefixSums[1][to] - prefixSums[1][from] > 0) {
                goal[j] = DNA[1];
            } else if (prefixSums[2][to] - prefixSums[2][from] > 0) {
                goal[j] = DNA[2];
            } else {
                goal[j] = DNA[3];
            }
        }

        return goal;
    }

Detected time complexity:
O(N + M)

jUnit GenomicRangeQuerySolutionTest.java

Report trainingEJ6PEK-VQT/