IBM's analysis software, Data Science Experience desktop version, DSX Desktop (beta) has been upgraded, so I installed it. (I tried a little right after the beta started, but it's been a few months) The download volume is a little over 9GB, which is a bit large, but it may be appreciated that it is provided as a Docker image from the beginning: slight_smile:
Download from here https://datascience.ibm.com/desktop I installed the mac version
When you execute the downloaded one, you will see this screen
If you "Drag & Drop" DSX Desktop to the folder in the window, "IBM DSX Desktop" is created in the application folder, so execute it.
(I forgot to take a screenshot) Install both Notebook and R Studio You will be prompted to choose options such as whether to install both or whether to use Spark with Notebook. The download amount was about 6GB without Spark, increased by about 3GB with Spark to 9GB, and with R Studio, it was about 11GB. It seems that the R Studio part can be additionally installed later, so I added it because it was a big deal up to Spark.
As I continued the installation process, a long download started. It was about 9GB, but I think it took about 5 hours to run it in my home LAN environment. I failed once on the way and tried again. The reason for the failure was that the download was moss on the way. At the time of retry, the screen saver of Mac is also temporarily turned off, and it is executed earnestly. After the download is completed, Extract runs for about 5 minutes to complete the installation. (Of the installation work, the only thing I'm addicted to is this download. The rest is smooth)
When I ran it, the movement was very light and impressed: grin: It may not be as good as the DSX offered in SaaS in the cloud. When you start it, click the "≡" icon on the upper left to display the screen for creating a notebook.
(It is a screenshot taken after making various things, but the notebook creation screen is as follows. Click add notebook to create it.)
Of course, the notebook made with Jupyter also works. (However, as will be described later, the directory structure is not on the Mac, but inside the Docker container, so that area will not work as it is.)
When I tried to run it on a trial basis, immediately after installation, read_excel of pandas gave an execution error. The cause was that xlrd was not included. I added it with! Pip intall in my Notebook, and now I can run it normally.
Unlike the cloud version, DSX seems to be limited to file formats, at least in beta. Press the button called add dataset, or press the icon on the upper right (a button that looks like an "n = 2 identity matrix" that combines 1s and 0s) to register. Now when you import, the local file will be imported into Docker.
It seems to be stored under / opt / notebooks / assets. (In the screenshot above, I pwded to see the default execution directory at runtime and the assets folder with the registered files)
It works with docker and I know the directory, so I registered it from the Mac terminal on the command line. The list.txt in the screenshot above is registered from the command line.
If you check it, it's called anaconda_with_spark. (The second and subsequent ones may not have been installed this time.)
If you run the shell in the container where DSX Desktop is running, you can see what's going on. (Run shell with docker exec) I don't know if it's okay to customize it, but maybe you can create your own committed container image in your local environment.
Recommended Posts