Break up pybind11 interface into separate compilation units (not as beneficial as hoped); still very slow
Check length of compositions, output type
I think at 0.4 us/call on this machine