In reality, people's attention and interest in data preparation, calculation, and visualization is about 1: 4: 5, while the ratio in practice is about 6: 2: 2. This is just an experience, but is it safe to say that most of the analysis is a preparation for setting up the data in a computable state? That's why today I'll write a series of techniques that I often use in Python, which emphasizes usability as a glue language.
The content isn't particularly new, but JSON format conversion is a common pre-process. I will leave it as a memo so that I will not forget it.
In Python, JSON-formatted data structures can be handled by import json. When loaded, it becomes a dictionary (hash in other languages) data format, which can also be in JSON format.
Assume a CSV file separated by key and value. Suppose that the value part contains text data in JSON format serialized by another system. To load this, you can split the string by specifying the delimiter as follows, and load the JSON as a dictionary type object with the json.load method.
import json
file = open(self.filename, 'r')
for line in file:
key, value = line.rstrip().split(",")
dic = json.loads(value)
On the other hand, when writing a dictionary type object to a file or standard output in JSON format, use json.dumps as follows.
import json
json_obj = json.dumps(dic)
print(json_obj)
It's easy.
The arguments passed to the Python script are stored in sys.argv.
import sys
if __name__ == '__main__':
argsmin = 1
if len(sys.argv) > argsmin:
some_instance = SomeClass(sys.argv)
some_instance.some_method()
If you pass it when initializing the instance, you can use it by storing the argument in the instance variable.
class SomeClass:
def __init__(self, args):
self.filename = args[1]
Python customarily assumes that a method is a private method by prefixing it with a _.
def self._some_method(self):
...
Sure, you can't call it some_instance.some_method. But you can actually call it explicitly with some_instance._some_method. It's habitual and not functionally private.
Use two __ to prevent it from being called functionally.
def self.__some_method(self):
...
However, even with this, there is a tricky way to call it, and the testability of the private method will be reduced, so I do not recommend it very much. The fact that you can call a private method, in other words, makes it easier when testing.
There are many different Python testing frameworks, but nose is relatively easy to use. First, consider a method that behaves as follows.
import factorial
factorial.factorial(10)
#=> 3628800
Let's say this Python code is named factorial.py and the method implementation looks like this.
def factorial(n):
if n==1:
return 1
else:
return n * factorial(n-1)
To test this, create a file named test_factorial.py and write the test code as follows:
from nose.tools import * #Loading the testing framework
from factorial import * #Loading code under test
def test_factorial(): #Factorial method test
i=10
e=3628800
eq_(e,factorial(i)) #Verification
The above is equivalent to doing eq_ (3628800, factorial (10)) The eq_ method verifies that the values are equal.
After implementing the test code, issue the nosetests command from the shell.
$ nosetests
.
----------------------------------------------------------------------
Ran 1 test in 0.008s
OK
It's easy to spend a lot of time preparing steady data. This time, I have summarized the techniques often used in such work as a memorandum.
Recommended Posts