--Organization of Apache Spark's Sparse Vector and Dense Vector
Sparse means "sparse".
When the elements of a vector contain many 0s For example
[0.1,0.0,0.0,0.0,0.3]
When there was a vector called
To express this vector
"** The value of the first element is 0.1 **, the value of the last element is 0.3 **, and the number of elements is 5 **."
It is based on the idea that the information is enough. (The values of other elements are ** 0.0 **)
In this way, the amount of information can be reduced, so the benefit is that ** memory can be saved **.
Something like a sparse vector is implemented in most machine learning libraries.
Now, how to make a sparse vector in Spark is easy.
Use ** org.apache.spark.ml.linalg.SparseVector ** in the ** spark.ml ** package.
** SparseVector ** class initializes by specifying an array of indexes (indices) and an array of values (values)
If you want to create [0.1,0.0,0.0,0.0,0.3]
, you can create an array of indexes (indices) new int [] {0, 2}
and an array of values (values) new double. Initialize with [] {0.1, 0.5}
OK
SparseVector
// SparseVector(Sparse vector)
int size = 3;//Vector element size
int[] indices = new int[] { 0, 4 };
double[] svalues = new double[] { 0.1, 0.5 };
Vector svec = new SparseVector(size, indices, svalues);
System.out.println("SparseVector=" + Arrays.toString(svec.toArray()));
Execution result
SparseVector=[0.1, 0.0, 0.0, 0.0, 0.5]
It can be arrayed with Vector # toArray
, but of course the memory consumed is saved because only indices (array of subscripts) and values (array of values) are retained when not needed.
Pair with sparse vector. It holds all the values of the vector elements like a general array.
[0.1,0.0,0.0,0.0,0.3]
DenseVecotr initializes by specifying an array of values (values)
If you want to create [0.1,0.0,0.0,0.0,0.3]
, pass the array new double [] {0.1, 0.0, 0.0, 0.0, 0.5}
of the number of elements to create it. Is Dense Vector
DenseVector
// DenseVector(Dense vector)
double[] dvalues = new double[] { 0.1, 0.0, 0.0, 0.0, 0.5 };
Vector dvec = new DenseVector(dvalues);
System.out.println("DenseVector=" + Arrays.toString(dvec.toArray()));
Execution result
DenseVector=[0.1, 0.0, 0.0, 0.0, 0.5]
Use ** Apache Spark ** from Java
SparkVectorExamples.java
package org.riversun.spark;
import java.util.Arrays;
import org.apache.spark.ml.linalg.DenseVector;
import org.apache.spark.ml.linalg.SparseVector;
import org.apache.spark.ml.linalg.Vector;
public class SparkVectorExamples {
public static void main(String[] args) {
// DenseVector(Dense vector)
double[] dvalues = new double[] { 0.1, 0.0, 0.0, 0.0, 0.5 };
Vector dvec = new DenseVector(dvalues);
System.out.println("DenseVector=" + Arrays.toString(dvec.toArray()));
// SparseVector(Sparse vector)
int size = 5;//Vector element size
int[] indices = new int[] { 0, 4 };
double[] svalues = new double[] { 0.1, 0.5 };
Vector svec = new SparseVector(size, indices, svalues);
System.out.println("SparseVector=" + Arrays.toString(svec.toArray()));
}
}
Execution result
DenseVector=[0.1, 0.0, 0.0, 0.0, 0.5]
SparseVector=[0.1, 0.0, 0.0, 0.0, 0.5]
Recommended Posts