Last time, I introduced the ** Fusion ** function that can handle multiple data sources as one. By making the best use of ** microqueries ** and transparently handling valuable data sources scattered on the network as one data source, the value and potential of data can be further improved. I hope you understand that the idea of the mechanism itself related to data sources can be evolved in a more efficient and flexible direction.
Now, from this time, I would like to introduce the cooperation with ** big data **, which is the real essence of ** Zoom data **.
Articles such as Zoomdata and Hadoop (Hive on Tez) cooperation (Azure edition) written by ** Kitase ** have already been published. So, there may be some people who actually verified the cooperation, but this time we will use the well-known ** big data solution ** that is open to the public, and use each sandbox in virtual space. So, I would like to explain the basic setting procedure etc. while linking with Zoomdata locally.
As for the environment of ** big data **, there is a relationship that this series of work is done on the virtual environment, so I would like to proceed with the work in the direction of utilizing the built virtual image called Sandbox. I will. If you search with a search engine, you will soon get information about some trial environments, but this time for the first time, ** Cloudera ** I will use Sandbox provided by the company and try to verify the connection with ** Zoomdata **.
Select ** Download ** from Cloudera Homepage and select ** DOWNLOAD NOW In ** Quick Starts **. If you select com / downloads / quickstart_vms / 5-12.html) **, you will be able to register the download of the currently distributed virtual image (at the time of writing), so ** Accurately specify the specified information. Please register ** and try to get the desired virtual image. In addition to ** Zoomdata **, memory space for a big data environment for connection verification and required CPU resources will be required from this time, so depending on the case, the operating hardware may be separated. It may be necessary to take measures such as raising it (matching the consistency of the network etc.), but please build each environment at your own risk. In addition, we would appreciate it if you could refrain from making inquiries to Cloudera regarding this matter.
If the download is completed successfully and the startup is successful, the desktop screen will be displayed. (Note: Although the screen is in Japanese (I set it for my personal interest only ...), the original environment will be the English version. Of course, it can be verified without problems even in the English environment, so it is as it is Proceed, please)
First of all, set the verification data. There is a display called ** Hue ** at the top of the browser, so please select it. The display will change and the work of ** Step 1 ** will start, but after a while, the check contents of the environment will be displayed on the screen.
Then select ** Examples ** in ** Step 2 **.
Select the data to be used in this verification. This time, I think that I will verify the connection between ** Impala ** and the search system ** Solr Search **, so select and install them in order.
Check the data generation just in case. Select the ** Home ** (My Documents) icon at the top of your browser screen.
You can confirm that the verification data has been set up successfully, so check the IP address of the virtual machine as well.
Now that the big data source side is ready, I would like to start setting up the connection with ** Zoomdata **. First, enter with ** admin **, select the gear icon at the top of the console screen, and select ** Sources **.
Then select the ** Cloudera Impala ** icon.
Set the required items and select ** Next ** at the bottom left.
Select a new connector setting and set the required parameters. Set a unique name for each connector and enter the ** JdbcUrl ** information as follows.
jdbc:hive2://xxx.xxx.xxx.xxx:21050/;auth=noSasl
For ** xxx.xxx.xxx.xxx **, set the IP address confirmed earlier, and the port number is defined in advance by each data source, so basically use those numbers as they are. You can leave other items as they are. (For information on this area, please check the document of the provider)
After completing the basic settings, select ** Validate ** and the connection will be verified, so please wait for a while.
If the green pop-up that occupies the connection completion is displayed at the top of the screen, the connection setting with the data source is completed successfully. Next, select the data to be handled. Select ** Next ** at the bottom right of the screen.
There will be some preset samples, so this time I will select ** web_logs **, which seems to have the most data items.
The details of each data will be displayed. Select ** Next ** at the bottom right. This time, it means to try it quickly, so basically, please proceed as it is for the subsequent screens. However, in order to link the data with ** Time Bar **, please change some of the attributes of the data items explained earlier. (Specifically, change the attribute of ** day ** to ** TIME **, replace the custom setting with ** yyyy-MM-dd **, and then set the next item to ** DAY **. If you do, it's OK)
Once the ** day ** attribute changes have been successfully completed and the overall parameter settings have been completed, the ** Zoomdata ** side will be able to access ** Impala ** with their favorite microquery.
Now let's create a simple dashboard to verify the connection. The procedure will be the same as before, so we will proceed ** quickly **.
Select ** Cloudera Impala ** set this time from Create Dashboard from the icon on the upper left.
Since the available charts will appear, select ** Bars ** and set ** Group ** at the bottom of the chart to ** City **.
Slide the ** Time Bar ** at the bottom of the chart to see if the displays sync. Next, let's create a ** donut chart **. The procedure is still the same, so here we will only introduce the flow.
As for the display data of ** Donut Chart **, there was a usage ratio of ** OS ** in the data item, so I selected it in ** Group **.
Finally, I would like to save the dashboard I created this time. If necessary, change each chart title and dashboard title (as usual, it's a "naughty title" ...), then select ** Save ** at the top right of the console screen and enter the required information. After that, if you select ** Save ** at the bottom right of the pop-up, it will be displayed on the home screen of the console, so you can start from here from the next time onwards.
This time, we verified the cooperation between ** big data ** and ** Zoom data ** using a virtual environment using ** Cloudera Impala ** as an example. You can see that connecting to a well-built big data environment is actually very simple (in this regard as well as each upcoming ** big data solution **). Of course, since it is a verification environment, there is a limitation that it is not big, but of course, for each solution that assumes ** scale out **, ** microquery ** connection and ** in-memory technology * * Efficient and high-speed connection can be realized , so even in an actual huge " real big data **" environment, ** simple and flexible use and operation can be realized * * It will be possible to do.
Next time, I would like to connect with ** Solr **, which confirmed the existence of demo data in the settings at the beginning.
Regarding the creation of this article, we used Sandbox, which is open to the public by ** Cloudera**, as the engine of the big data source. We would like to take this opportunity to thank you very much.
Recommended Posts