- Python only supports a single constructor per class, the
__init__
method
@classmethod
to define alternative constructors for your classes.Effective Python
Write MapReduce
Bad example.
Though make base class for InputData and Woker, code are sticked to each other in create_worker
and generate_inputs
. If you want to use other class for mapreduce, you have to rewrite your code, which is not ideal.
import os
from threading import Thread
class InputData:
def read(self):
raise NotImplementedError
class PathInputData(InputData):
def __init__(self, path):
super().__init__()
self.path = path
def read(self):
return open(self.path).read()
class Worker:
def __init__(self, input_data):
self.input_data = input_data
self.result = None
def map(self):
raise NotImplementedError
def reduce(self):
raise NotImplementedError
class LineCountWoker(Worker):
def map(self):
data = self.input_data.read()
self.result = data.count('\n')
def reduce(self, other):
self.result += other.result
def generate_inputs(data_dir):
for name in os.listdir(data_dir):
yield PathInputData(os.path.join(data_dir, name))
def create_worker(input_list):
workers = []
for input_data in input_list:
workers.append(LineCountWoker(input_data))
return workers
def execute(workers):
threads = [Thread(target=w.map) for w in workers]
for thread in threads: thread.start()
for thread in threads: thread.join()
first, rest = workers[0], workers[1]
for workers in rest:
first.reduce(worker)
return first.result
def mapreduce(data_dir):
inputs = generate_inputs(data_dir)
workers = create_worker(inputs)
return execute(workers)
with TemporaryDirectory() as tmpdir:
write_test_files(tmpdir)
result = mapreduce(tmpdir)
Good practice
The main difference here is using @classmethod
for common constructor interface. It looks like factory pattern. A mapreduce
function accepts classes and instantiate them by calling @classmethod
, which enable modulable and untidy code.
import os
from threading import Thread
class GenericInputData:
def read(self):
raise NotImplementedError
@classmethod
def generate_inputs(cls, config):
raise NotImplementedError
class PathInputData(GenericInputData):
def __init__(self, path):
super().__init__()
self.path = path
def read(self):
return open(self.path).read()
@classmethod
def generate_inputs(cls, config):
data_dir = config['data_dir']
for name in os.listdir(data_dir):
yield cls(os.path.join(data_dir, name))
class GenericWorker:
def __init__(self, input_data):
self.input_data = input_data
self.result = None
def map(self):
raise NotImplementedError
def reduce(self):
raise NotImplementedError
@classmethod
def create_workers(cls, input_class, config):
workers = []
for input_data in input_class.generate_inputs(config):
workers.append(cls(input_data))
return workers
class LineCountWoker(GenericWorker):
def map(self):
data = self.input_data.read()
self.result = data.count('\n')
def reduce(self, other):
self.result += other.result
def mapreduce(worker_class, input_class, config):
workers = worker_class.create_workers(input_class, config)
return execute(workers)
def execute(workers):
threads = [Thread(target=w.map) for w in workers]
for thread in threads: thread.start()
for thread in threads: thread.join()
first, rest = workers[0], workers[1]
for workers in rest:
first.reduce(worker)
return first.result
with TemporaryDirectory() as tmpdir:
write_test_files(tmpdir)
config = {'data_dir': tmpdir}
result = mapreduce(LineCountWoker, PathInputData, config)
Polymorphism
http://stackoverflow.com/questions/1031273/what-is-polymorphism-what-is-it-for-and-how-is-it-used
If you think about the Greek roots of the term, it should become obvious.
Poly = many: polygon = many-sided, polystyrene = many styrenes (a), polyglot = many languages, and so on. Morph = change or form: morphology = study of biological form, Morpheus = the Greek god of dreams able to take any form. So polymorphism is the ability (in programming) to present the same interface for differing underlying forms (data types).
For example, integers and floats are implicitly polymorphic since you can add, subtract, multiply and so on, irrespective of the fact that the types are different. They're rarely considered as objects in the usual term.
But, in that same way, a class like BigDecimal or Rational or Imaginary can also provide those operations, even though they operate on different data types.
The classic example is the Shape class and all the classes that can inherit from it (square, circle, dodecahedron, irregular polygon, splat and so on).
With polymorphism, each of these classes will have different underlying data. A point shape needs only two co-ordinates (assuming it's in a two-dimensional space of course). A circle needs a center and radius. A square or rectangle needs two co-ordinates for the top left and bottom right corners (and possibly) a rotation. An irregular polygon needs a series of lines.
And, by making the class responsible for its code as well as its data, you can achieve polymorphism. In this example, every class would have its own Draw() function and the client code could simply do:
shape.Draw()
to get the correct behavior for any shape.
This is in contrast to the old way of doing things in which the code was separate from the data, and you would have had functions such as drawSquare() and drawCircle().
Object orientation, polymorphism and inheritance are all closely-related concepts and they're vital to know. There have been many "silver bullets" during my long career which basically just fizzled out but the OO paradigm has turned out to be a good one. Learn it, understand it, love it - you'll be glad you did :-)
(a) I originally wrote that as a joke but it turned out to be correct and, therefore, not that funny. The momomer styrene happens to be made from carbon and hydrogen, C8H8, and polystyrene is made from groups of that, (C8H8)n.
Perhaps I should have stated that a polyp was many occurrences of the letter p although, now that I've had to explain the joke, even that doesn't seem funny either.