This article describes what I researched about PartiQL. I think that there are deficiencies such as omissions in the content, so in that case I would appreciate it if you could point out with Edit Request or comment. This article was written as of November 7, 2019. PartiQL has just been announced, and I think that the implementation will continue to progress and the description may not be appropriate, so please forgive that point.
PartiQL is a SQL compatible language published by Amazon as Open Source. As a feature, it can have a wide range of DB forms to be controlled, and it seems that it can handle nested data such as JSON in addition to RDB.
Announcing PartiQL, one query language for your relational and non-relational data: https://t.co/ykrFrrOFEO pic.twitter.com/VN2FLJkvqN
— AWS Open Source (@AWSOpen) August 1, 2019
For the actual operation, please refer to Tutorial. If you know SQL quite well, you can operate it intuitively. At this point, the Update / delete / Insert statement doesn't seem to be implemented.
Here is the main issue. This PartiQL uses PartQL instead of adding its own language as a query language for the API when adding search and statistical processing functions in the service ([SPL] in Splunk (https://docs.splunk.com) /Documentation/Splunk/8.0.0/SearchReference/UnderstandingSPLsyntax)) There seems to be a use case. So, although it is still an alpha version, I connected it to MongoDB and actually tried it.
First, install PartiQL. If you look at Getting Start on the official website, you can install it without any problems. In addition, JRE is required as a prerequisite. This has been confirmed to work with OpenJRE9.
You can try PartiQL in REPL format by executing the command and interactively executing the user's input.
$ ./bin/partiql
Welcome to the PartiQL REPL!
PartiQL>
However, it seems that no further implementation has been done at this stage, and there is no particular connector with MongoDB.
Therefore, in this article, we will use Python to create a PoC that outputs the result of executing partiql from the Subprocess module, using the PartiQL input query from the user and the data acquired from the DB in the Pymongo module as arguments.
The following is the actual PoC prototype. It takes information from MongoDB and displays the result of PariQL query search. (No error handling is performed)
partiQLCtl.py
import subprocess
import json
import os
import pymongo
class PartiQLCtl():
def __init__(self, address=None, port=None):
# address,If there is no description of port, search by direct input of search source data JSON
if (address is not None) and (port is not None):
self.client = pymongo.MongoClient(address, port)
else:
self.client = None
def load_json_data(self, json_data):
#Search source data JSON Function for direct input
self.env_data = self.convert_json_to_partiql(json_data)
print(self.env_data)
def load_db_data(self, db, collection):
#Extract search source data from MongoDB
cur = self.client[db][collection].find({})
#DB output result is data['record']Save as list below
data = {"record": []}
for record in cur:
# _id String processing
record['_id'] = str(record['_id'])
data["record"].append(record)
self.env_data = self.convert_json_to_partiql(data)
def execute(self, query):
#Function for executing a query
# -Dummy data for the i option. It is necessary to create a file in advance. The contents are"{}"Only the two-letter description of
ion_file_path = os.path.join(os.path.dirname(__file__), '../tmp/tmp.ion')
# -Source data for the e option. Create a file in this script. Change location as needed
env_file_path = os.path.join(os.path.dirname(__file__), '../tmp/env.ion')
#Executable file. Change location as needed
partiql_execute_path = os.path.join(os.path.dirname(__file__),'../dist/partiql/bin/partiql')
with open(env_file_path, 'w') as f:
f.write(self.env_data)
res = subprocess.check_output([partiql_execute_path, '-q', query, '-i', ion_file_path, '-e', env_file_path])
return res
def convert_json_to_partiql(self, json):
#A function that changes JSON to the data format for PartiQL. list/dict/boolean/int/Supports str type
if type(json) is list:
env = "<<"
for idx, elem in enumerate(json):
if (type(elem) is dict) or (type(elem) is list):
env += (self.convert_json_to_partiql(elem))
elif type(elem) == str:
env += "'{}'".format(elem)
elif elem is None:
env += "null"
elif elem is True:
env += "true"
elif elem is False:
env += "false"
else:
env += str(elem)
if idx != len(json) - 1:
env += ', '
env += '>>'
elif type(json) is dict:
env = '{'
for idx, elem in enumerate(json.keys()):
if (type(json[elem]) is dict) or (type(json[elem]) is list):
env += "'{}': {}".format(elem, self.convert_json_to_partiql(json[elem]))
elif type(json[elem]) == str:
env += "'{}': '{}'".format(elem, json[elem])
elif json[elem] is None:
env += "'{}': null".format(elem)
elif json[elem] is True:
env += "'{}': true".format(elem)
elif json[elem] is False:
env += "'{}': false".format(elem)
else:
env += "'{}': {}".format(elem, str(json[elem]))
if idx != len(json.keys()) - 1:
env += ', '
env += '}'
return env
if __name__ == '__main__':
pql = PartiQLCtl("192.168.1.10", 27017)
pql.load_db_data("test", "test")
print(pql.execute("select r.id from record r"))
In addition, the following data is stored in MongoDB in advance.
> use test
switched to db test
> db.test.insert({"id": "aa", "setting": [{"config1": "hoge", "config2": "fuga"}]})
WriteResult({ "nInserted" : 1 })
> db.test.insert({"id": "bb", "setting": [{"config1": "hoge2", "config2": "fuga2"}]})
The execution result is as follows.
$ python partiQlCtl.py
b"<<\n {\n 'id': 'aa'\n },\n {\n 'id': 'bb'\n }\n>>"
Try to see that the display data is changed by changing the last pql.execute argument (PartiQl query).
When you execute the help of partiql, it seems that you can also execute it as a command other than REPL as shown below.
$ ./bin/partiql -h
PartiQL CLI
Command line interface for executing PartiQL queries. Can be run in an interactive (REPL) mode or non-interactive.
Examples:
To run in REPL mode simply execute the executable without any arguments:
partiql
In non-interactive mode we use Ion as the format for input data which is bound to a global variable
named "input_data", in the example below /logs/log.ion is bound to "input_data":
partiql --query="SELECT * FROM input_data" --input=/logs/log.ion
The cli can output using PartiQL syntax or Ion using the --output-format option, e.g. to output binary ion:
partiql --query="SELECT * FROM input_data" --output-format=ION_BINARY --input=/logs/log.ion
To pipe input data in via stdin:
cat /logs/log.ion | sqlcli --query="SELECT * FROM input_data" --format=ION_BINARY > output.10n
Option Description
------ -----------
-e, --environment <File> initial global environment (optional)
-h, --help prints this help
-i, --input <File> input file, requires the query option (default: stdin)
-o, --output <File> output file, requires the query option (default: stdout)
--of, --output-format <OutputFormat: output format, requires the query option (default: PARTIQL)
(ION_TEXT|ION_BINARY|PARTIQL)>
-q, --query <String> PartiQL query, triggers non interactive mode
It seems that you can execute a query search on the shell by executing the following command.
partiql --query="SELECT * FROM input_data" --input=/logs/log.ion
There is a statement here that --input can only use the ION format. For those who don't know here, ION format is a data serialization format created by Amazon, and is a JSON-like Text format description. Besides, it is written in binary format. Module in Python is also provided, so you can convert JSON ⇔ ION by using this.
>>> import amazon.ion.simpleion as ion
>>>
>>> obj = ion.loads('{"id" : "aa", "setting" : [ { "config1" : "hoge", "config2" : "fuga" } ] }')
'$ion_1_0 {id:"aa",setting:[{config1:"hoge",config2:"fuga"}]}'
When you actually save the output result as test.ion and execute it, it will be as follows.
$ cat test.ion
$ion_1_0 {id:"aa",setting:[{config1:"hoge",config2:"fuga"}]}
$ ./bin/partiql -q "select * from input_data" -i test.ion
<<
{
'id': 'aa',
'setting': [
{
'config1': 'hoge',
'config2': 'fuga'
}
]
}
>>
However, when I tried a few things here, when I entered nested data in the from clause, an unexpected event occurred in which the output result was blank. Here, I have given up on pursuing this method.
$ ./bin/partiql -q "select * from input_data.setng" -i test.ion
<<
{}
>>
However, exchanging the output result of REPL by connecting PIPE with Python's Subprocess Popen is not very clean as a code, so we will consider another method.
There is a method to input the information of the search source of the input here with the -e option other than the -i option. (In this case, the data to be input will be PartiQL original format data, so it will be necessary to convert to JSON⇔PariQL original format.) However, with the -q option and -e option, input information is waited for from standard input, so when executed, it will be in the user input waiting state. Finally, I achieved the original purpose by having the -i option read dummy data as a dummy.
Recommended Posts