In recent years, the environment surrounding ** big data ** has recently been particularly focused on dramatic evolution and its potential, along with technologies such as ** machine learning ** and ** AI ** for new problem solving and value creation. Isn't it a seed? In that part, with the evolution of the network society, it has grown into a large market. In the world surrounding data up to now, it is more affected by device performance and environment than the original value of data, and it is processed as small as possible to efficiently store data, and its reuse is also done in those processes. How can we infer and reproduce the deleted information with reasonable validity? Many man-hours and excellent theories and techniques have been developed. Of course, the data as a solemn result speaks eloquently above all, so based on that data, (1) perform the prescribed procedures, (2) verify the validity of the temporary construction, and (3). Most of the time, "starting from waiting for the result of data processing" ** to create the next move and (4) collect the next data by implementing it. Then, on a general-purpose platform, who can observe the process of creating the final form of data ** on-stream (not guess) ** and can quickly respond to changes in the environment such as business. But what happens when it becomes available? And what happens next if it allows you to take full advantage of the hidden potential and value that your data originally had? ・・・ I would like to introduce the situation in this area again, but in a hurry, various big data that are indispensable for the environment can be processed at high speed and with high efficiency ** on-stream (user's thinking and original work). I will continue to explain how to build an environment that can be visualized and analyzed with (do not stop) **.

(1) Preparation of environment

This time, ** Hortonworks ** will use the sandbox published by the company to link with ** Zoomdata **. Select the orange ** Start ** in the upper right part of the home page and go to ** Displayed Page ** If you select ** Download SANDBOX **, a list of target virtual images will appear (at the time of writing), so download version 2.6 this time. At that time, the input screen for the required items will appear, so please register each information accurately and get the environment. After downloading, if the installation is successful in the virtual environment, each service will start and the console screen will be displayed.

You will see how to access the management console via the web on the screen, so try entering that address in your browser.

Select ** LAUNCH DASHBOARD ** on the left side of the screen, enter your ID and password, and you should see the dashboard.

Just in case, when I check if there is data that can be used for verification, it seems that there are some interesting tables, so when creating a dashboard after connection, let me use those data I will get it.

(2) Connection setting with Zoomdata

First of all, enter the usual ** Sources ** page with ** admin **. This time, we will try to connect using ** Hive On Tez **, so select that icon.

As usual, enter the required information and select ** Next ** at the bottom right of the screen.

The settings this time are as follows.

Connection Name : Hortonworks Hive On Tez JdbcUrl : jdbc:hive2://xxxx.xxxx.xxxx.xxxx:10000/default Where xxx.xxx.xxx.xxx is the address of the virtual machine where the sandbox is running.

After completing the settings, select ** Validate ** and wait for a while.

(3) Connection verification and dashboard creation

Once successfully connected, you will be able to select the available data tables, so this time I would like to create a dashboard using ** Store **.

Regarding the subsequent settings, I will not play with it this time, so please proceed with ** crisp ** and move to the ** Zoom data ** homepage. There is a display ** + New ** in the upper left of the screen, so please select it. Several menus will pop up. Select the item ** Chart & Dashboard ** and select ** Hive On Tez ** set this time from the data source menu to connect.

Since the available charts will appear, select ** Bars: Multiple Metrics ** for the time being and check all the items that can be displayed. (** Volume ** is removed for clarity)

Connection verification is complete when the set 5 item bar graph is displayed.

Similarly, let's create a visualization chart for the available information and create a ** "Nanchatte Dashboard" ** using the methods we have introduced so far.

Save the completed dashboard so that you can easily reuse it next time.

It was saved safely.

(4) Summary of this time

By the way, this time we performed connection verification using ** Hive On Tez **, but basically, if the data source side is firmly built, it is easier than expected to visualize and analyze ** big data **. I hope you understand that you can do it. From the next time onward, we will continue to verify some solution connections and introduce related information around them. Thank you for your cooperation.

(5) Acknowledgments

Regarding the creation of this article, we used Sandbox, which is published by ** Hortonworks **, as the engine of the big data source. We would like to take this opportunity to thank you very much.

(6) Digression ... (?)

In the previous article, I introduced ** Fusion ** that can be treated as if it were one data source by setting a common key between different data sources. Therefore, as we proceeded with the connection verification of a series of ** big data ** solutions this time, there was a table with the same configuration in the test data prepared in advance, so using that data ** "different big data" I would like to verify "fusion between solutions" **. (This time, we will use ** sample_07 ** and ** sample_08 ** to verify with the combination of ** Cloudera Impala ** and ** Hortonworks Hive On Tez **)

We will define the connector for ** Fusion **, but the procedure is the same as the one introduced so far, so detailed explanation will be omitted here. Please note that in the case of ** fusion ** setting between ** big data **, please set the parameter in English due to the specifications.

The definition of ** Fusion ** this time is as follows using ** code ** that is common to both parties.

After that, if you proceed as it is, this ** Fusion ** will be registered on the ** Sources ** page, so click ** + New Chart & Dashboard ** at the top left of the screen. Select it and create a chart with ** Bars: Multiple Metrics **.

The maximum number of data to be displayed is displayed in the upper left of the chart, so select that number and change the maximum value via the pop-up. (Changed from 20 to 100 this time)

** Impala ** and ** by pulling data from two different ** big data ** sources based on common key information and setting parameters to display on the same chart We were able to verify that the chart was created by accessing using ** microquery ** via Hive On Tez **.

I think that the use of ** fusion ** between different ** big data ** sources can provide innovative data utilization ideas and opportunities for new problem solving, so I hope you will take advantage of it.