Redash supports many data sources by default. Furthermore, even if some data sources are not supported, you can add new data sources by writing query_runner in Python. You can write query_runner straightforwardly if you have a Python library to connect to the target data source, but some DBs may only provide JDBC drivers. Since the JDBC driver can only be called from the JVM language, it cannot be directly incorporated into redash's query_runner. So I devised a way to incorporate the JDBC driver into redash, and this article will show you how to do it.

Library to use

This time it is a library for calling JDBC driver from Python. Use JayDeBeApi. https://github.com/baztian/jaydebeapi

This library uses JPype under the CPython environment to bridge between Python and Java. https://github.com/jpype-project/jpype

There is Py4J in the library for calling Java from Python, and there is an introductory article on Qiita, but in comparison, JPype seems to be lighter because it does not need to start a Java process.

https://www.py4j.org/ https://qiita.com/riverwell/items/e90cbbfdac439e6e9d30

Incorporate JDBC driver into redash

This time we will embed the JDBC driver in redash v5.0.2. The JDBC driver to be included is from MySQL, and you can download it from the following URL. https://dev.mysql.com/downloads/connector/j/5.1.html

Since redash supports MySQL as standard, there is no need to forcibly incorporate the JDBC driver, but since it is a widely popular DB, it is used for explanation.

jdbc.py Next, write query_runner. How to write query_runner is well organized in the following articles. https://discuss.redash.io/t/creating-a-new-query-runner-data-source-in-redash/347

I created the following query_runner by referring to the above article.

import json
import logging
import os
import jaydebeapi

from redash.query_runner import *
from redash.settings import parse_boolean
from redash.utils import JSONEncoder

logger = logging.getLogger(__name__)
types_map = {
    jaydebeapi.STRING: TYPE_STRING,
    jaydebeapi.TEXT: TYPE_STRING,
    jaydebeapi.NUMBER: TYPE_INTEGER,
    jaydebeapi.FLOAT: TYPE_FLOAT,
    jaydebeapi.DECIMAL: TYPE_FLOAT,
    jaydebeapi.DATE: TYPE_DATE,
    jaydebeapi.DATETIME: TYPE_DATETIME,
    jaydebeapi.ROWID: TYPE_STRING,
}

class Jdbc(BaseSQLQueryRunner):
    noop_query = "SELECT 1"

    @classmethod
    def configuration_schema(cls):
        schema = {
            'type': 'object',
            'properties': {
                'host': {
                    'type': 'string',
                },
                'port': {
                    'type': 'number',
                    'default': 3306,
                },
                'user': {
                    'type': 'string'
                },
                'password': {
                    'type': 'string',
                },
                'database': {
                    'type': 'string',
                },
            },
            "order": ['host', 'port', 'user', 'password', 'database'],
            'required': ['host', 'port', 'user', 'database'],
            'secret': ['password']
        }

        return schema

    @classmethod
    def name(cls):
        return "jdbc"

    @classmethod
    def enabled(cls):
        try:
            import jaydebeapi
        except ImportError:
            return False

        return True

    def _get_tables(self, schema):
        query = """
        SELECT col.table_schema,
               col.table_name,
               col.column_name
        FROM `information_schema`.`columns` col
        WHERE col.table_schema NOT IN ('information_schema', 'performance_schema', 'mysql');
        """

        results, error = self.run_query(query, None)
        if error is not None:
            raise Exception("Failed getting schema.")

        results = json.loads(results)

        for row in results['rows']:
            if row['TABLE_SCHEMA'] != self.configuration['database']:
                table_name = u'{}.{}'.format(row['TABLE_SCHEMA'], row['TABLE_NAME'])
            else:
                table_name = row['TABLE_NAME']

            if table_name not in schema:
                schema[table_name] = {'name': table_name, 'columns': []}

            schema[table_name]['columns'].append(row['COLUMN_NAME'])

        return schema.values()

    def run_query(self, query, user):
        import jaydebeapi

        connection = None
        try:
            host = self.configuration.get('host', '')
            port = self.configuration.get('port', '')
            user = self.configuration.get('user', '')
            password = self.configuration.get('password', '')
            database = self.configuration.get('database', '')

            jclassname = 'com.mysql.jdbc.Driver'
            url = 'jdbc:mysql://{}:{}/{}'.format(host, port, database)
            driver_args = {'user': user, 'password': password}
            jar_path = '/app/redash/query_runner/mysql-connector-java-5.1.47.jar'
            connection = jaydebeapi.connect(jclassname, url, driver_args, jar_path)

            cursor = connection.cursor()
            logger.info("JDBC running query: %s", query)
            cursor.execute(query)

            data = cursor.fetchall()

            if cursor.description is not None:
                columns = self.fetch_columns([(i[0], types_map.get(i[1], None)) for i in cursor.description])
                rows = [dict(zip((c['name'] for c in columns), row)) for row in data]

                data = {'columns': columns, 'rows': rows}
                json_data = json.dumps(data, cls=JSONEncoder)
                error = None
            else:
                json_data = None
                error = "No data was returned."

            cursor.close()
        except jaydebeapi.Error as e:
            json_data = None
            error = str(e.message)
        except KeyboardInterrupt:
            cursor.close()
            error = "Query cancelled by user."
            json_data = None
        finally:
            if connection:
                connection.close()

        return json_data, error

register(Jdbc)

I think that most of the processing except configuration_schema can be used in common regardless of which JDBC driver is used.

Arrangement of various files

Place the jdbc.py created in the above procedure and the JDBC driver jar file in redash / query_runner.

Dockerfile Finally, create a Docker container to run Redash. Since JRE is required for JPype to work, patch the redash standard Dockerfile. Also, create requirements.txt for installing jaydebeapi.


FROM redash/base:latest

#Added JRE installation
RUN apt-get update && apt-get install -y openjdk-8-jre

COPY redash/requirements.txt redash/requirements_dev.txt redash/requirements_all_ds.txt requirements_jaydebeapi.txt ./
RUN pip install -r requirements.txt -r requirements_dev.txt -r requirements_all_ds.txt -r requirements_jaydebeapi.txt

COPY . ./
RUN npm install && npm run build && rm -rf node_modules
RUN chown -R redash /app
USER redash

ENTRYPOINT ["/app/bin/docker-entrypoint"]
CMD ["server"]

`requirements.txt`


JayDeBeApi==1.1.1
JPype1==0.6.3

Start redash

At this point, you can start redash. In addition, in order to load the added query_runner into redash, it is necessary to add redash.query_runner.jdbc to the REDASH_ADDITIONAL_QUERY_RUNNERS environment variable and start it.

If you start redash and look at the DataSource screen, you can see that a data source called JDBC has been added.

At the moment, it looks a bit disappointing because there is no icon image showing the data source. To put an image in this part, place the image jdbc.png in client / app / assets / images / db-logos and build the container again.

Execute a query

Now that we have registered the JDBC driver as query_runner, let's run the query.

Summary

By using JayDeBeApi I was able to use the JDBC driver with redash. With this, the variety of redash data sources has expanded, and you can have a wonderful BI life.

[JAVA] How to use JDBC driver with Redash