[Java] I tried translating R and Java grammar [Updated from time to time]

7 minute read

Foreword

What is this?

This is a comparison of the syntax of R, which is a language specialized for processing and analysis of statistical data, with the format of Java, which is a general-purpose language. Please refer to it so that people who understand Java can use it when they want to write R.

Required prerequisites (Java side)

・Understanding the variable types (int, double, boolean, String, around Object) ・Understand multidimensional arrays ・List and Map can be used (It is used in some sample code) ・Extended for statement [Java 8] can be used (same as above)

Speaking of what you want to do in R, you can understand if this area is held down.

Required prerequisites (R side)

・I installed R ・I know that there is no need for a semicolon at the end of a sentence. ・Yaruki

R-Java dictionary

General grammar

Common to Java and R

Other than division entanglement arithmetic operators (+, -, *) ・Comparison operators (<, >, <=, >=, !=)

  • Conditional logical operators (&&,   ), logical operators (&, ), negation (!)

different things

  Java R
Assignment n = 0 n <- 0
Boolean value true, false TRUE, FALSE
(upper case)
Division (integer quotient) a / b a %/% b*1
Remainder a %b a %% b
Exponentiation Math.pow(a, b) a ^ b
Increment
Decrement
++a or a++
--a or a--
(a <- a+1)
Exists
Standard output System.out.println("Hello, world!"); print("Hello, world!")
Standard input Scanner class etc. readline("input :")*2
Constant definition final int N = 0; Does not exist
Comment out //comment
/* comment */
#comment*3

Annotation *1: In R, even if you divide integer types, they will be returned as a floating point type (double), so if you simply enter 10/4, 2.5 will be returned. *2: Be careful when relying on autocomplete because there is a similar function with a completely different meaning called readLines(). If the console is not interactive, this description will be ignored and the subsequent processing will proceed assuming that an empty string was input. You can check if it is interactive with print(interactive()). *3: When commenting out over multiple lines, it is necessary to put # on all lines.

if, for, switch statement

if_for_switch.java


if(statement){
  System.out.println(statement is true);
}

for(int i=0;i<n;i++){
  System.out.println("for loop "+ i);
}

switch(num){
  case 0:
    System.out.println("switch num=0");
    break;
  case 1:
    System.out.println("switch num=1");
    break;
  default:
    System.out.println("switch default");
}

if_for_switch.R


if(statement){
  print("statement is TRUE")
}

for(i in 1:n){
  print(paste("for loop ", i-1))
}

switch(num,
  "0" = {print("switch num=0")},
  "1" = {print("switch num=1")},
  {print("switch default")}
)

Annotation As in the standard output in the R for statement, it is not possible to output by connecting a character string and a numerical value with + (strictly deprecated even in Java). Use the paste() method. {} in each statement of R switch statement can be omitted when the content is only one statement.

How to call each element of a two-dimensional array (R: matrix)

matrix.java


int[][] a = new int[4][4];

matrix.R


a <- matrix(ncol=4, nrow=4)

Both make almost the same appearance (There are differences such as 0 if the initial value is java, NA if it is R. Also, in the case of R, the value taken by the matrix type contents is a number Not exclusively) The way to refer to each element of this matrix is as follows.

  Java R
1 element reference a[2][3] a[3,4]
1 element reference a[3][0] a[4] or a[4,1]
1 can be omitted only at the beginning of line
Line Reference a[1] a[2,]
Column reference - a[,2]

Do not use for statement (apply family)

If you don’t use the for statement, there’s a tendency to be R alone, but I can’t really explain it, so I’ll confront it with Java and see what the apply family is doing.

Apply function to each element of multidimensional array (apply)

apply.java


// Create and initialize the array from here
int[][] mtrx = new int[4][4];
for(int i=0;i<4;i++){
  for(int j=0;j<4;i++){
    mtrx[i][j] = i * 4 + j;
  }
}
//So far
// Prepare a function to be applied repeatedly from here
int add(int i){
  return i+1;
}
//So far
// scan array and apply function from here
for(int i=0;i<4;i++){
  for(int j=0;j<4;i++){
    mtrx[i][j] = add(mtrx[i][j]);
  }
}
//So far

A two-dimensional array containing the numbers 0 to 15 is created and 1 is added to each element (of course, for comparison, this is like ~ ).

apply.R


# Create and initialize matrix from here
a <- matrix(c(1:16), ncol=4, nrow=4, byrow=TRUE)
# So far
# Prepare function to be applied repeatedly from here
add <- function(i){
  return(i+1)
}
# So far
# Scan matrix from here and apply function
apply(a, c(1,2), add)
# So far

You can write very simply like this. I created a function to execute iteratively here, but you can use any of the provided functions (for example, using sqrt, you can get a matrix that takes the square roots of all elements).

Annotation

  • The byrow option of the matrix() function is an option that arranges the original vectors (c(1:16) in this example) into rows by row. The default is FALSE, so a transpose of the matrix obtained by executing this code will be generated.
  • The second option (c(1:2)) of the apply() function specifies the scope of application. Do 1 for “all rows” and 2 for “all columns”. If you just want to write a for loop for a 2D array, you don’t have to worry about it here, so don’t care about it. Specifically, the following two codes do roughly the same thing. R is not just overwritten, but really this.

apply2.java


int sum(int[] arr){ // Function that sums the array. Standard in R
  int sum = 0;
  for(int i: arr){
    sum += i;
  }
  return sum;
}

int[] sum_arr = new int[4];
for(int i=0;i<4;i++){
  sum_arr[i] = sum(mtrx[i]); // apply sum() function to all rows
}

apply2.R


apply(a, 1, sum)

Apply function to each element of HashMap (lapply)

First of all, it is necessary to understand that the list type variables in R are not required to have the same type of value to be stored inside. In other words, it’s like declaring that the Java Map handles object types unless you declare anything else.

lapply.java


Map<String, Object> a = new HashMap<>();
a.put("i", 123);
a.put("d", 4.56d);
a.put("b", false);

lapply.R


a <- list(i = 123, d = 4.56, b = FALSE)

In this way, it is possible to have an integer type, a double-precision floating point type, and a logical type at the same time. lapply operates on a list or matrix and returns the result in a list.

lapply2.java


Map<String, Object> a = new HashMap<>();
a.put("i", 123);
a.put("d", 4.56d);
a.put("b", false);

for(Object o :a.values()){
  add(o);
} // Compile error!

lapply2.R


a <- list(i = 123, d = 4.56, b = FALSE)

lapply(a, add) #add() function is diverted from apply.R

Since the argument of add() function defined in apply.java is int type, the above code cannot be executed in Java. But in R there is no need for type declaration and the variable is automatically matched to the largest type, so I can run this code (treated as FALSE=0, TRUE=1) and output: It

> lapply(a, add)
$i
[1] 124

$d
[1] 5.56

$b
[1] 1

You can use lapply for a two-dimensional array, but it will be difficult to handle unless you need to output as a list type. You can assume that it is such a function.For the 4x4 matrix generated by apply.java(apply.R), the following two lapply() outputs roughly the same thing.

lapply3.java


Map<String, Integer> lapply(int[][] arr){
  Map<String, Integer> a = new HashMap<>();
  for(int i=0;i<arr[0].length;i++){
    for(int j=0;j<arr.length;j++){
      a.put(String.valueOf(i * 4 + j + 1), add(arr[j][i]));
      Note that it is //arr[j][i], not arr[i][j].
      //In R, scanning for 2D array is done in columns
    }
  }
  return a;
}

lapply3.R


lapply(a, add)
Similar to #apply.R, add() is applied to all elements of a, but
# The output result looks different from apply.R. Please run and check

Annotation

  • In lapply3.java, Map<String, Integer> is used as the return value type, but since it actually enters Key, it is redundant because it is a number from 1 to 16 (which is cast to String type). Since the matrix of R can have a name in the row number and the column number, the value entered in Key may not be a numerical value when trying to correspond exactly, so such a description is used.

String processing

Overall, R is not good at character string processing. If a large amount of character string processing is required for data pre-processing, we would like to consider using other languages or using spreadsheet software first.

Linking

str_concat.java


//Example 1
String str = "Hello" + "" + "world!"; // Not recommended because it consumes a lot of memory

// example 2
StringBuilder builder = new StringBuilder();
builder.append("Hello");
builder.append(" ");
builder.append("world!");
String str = builder.toString();

str_concat.R


str <- paste("Hello", "", "world!")

Supplement

Execution environment of code in this page

Java

  • Java version: 1.8.0_231
  • IDE: Intellij IDEA Community Edition 2019.2

R

  • R version: R x64 3.6.0
  • IDE: R Studio 1.2.1335

Please refer to it if you cannot execute it.