Provided sensors and actuators, it must process and combine its sensory inputs, and generate choreographed actuator-activation signals that tell its parts what to do.
Spatial representation is essential. It must have a representation of empty space, things in space, itself and its parts and their relationships in space, the evolving relations between itself and its parts and other things in space, the characteristics such as density, texture, softness, which must be encoded with space. The physical model of course applies to everything in space, and that spatial representation ought to be close to a learnability of physical action.
Suppose then that you are responsible to build a space-capable system, that is, a system that depends on things in space, so that it has to have, in some sense, some knowledge or awareness of space and what's in space. Indeed your system might be a robot, a drone, a computer vision algorithm, a path planner, an obstacle avoider, a more or less self-driving vehicle, a geographic search or match system, a tracking or targeting system, a hunter seeker.
In spatial applications quality means precision, but precision is not just sensor-limited but compute-limited, and the limits of computation are defined by algorithmic complexity. Those who use substantially worse algorithms, hampered by large and unnecessary inefficiencies, are likely to be left behind, even to be destroyed in warfare by those who use better ones.
Typically your system will take observations or make measurements
yielding data about what is where. In general such systems can carry out
several levels of analysis and computation:
Each of these levels, modules, algorithms, has to take regions or
points and search for and find other points in or near them, or
determine that the region is empty. K-nearest neighbor, for example,
is a search across the entire observed universe to select the K other
points that are closest. (Out of 1M points, it could be a long
search, especially if the dimensions are encoded separately.)
aNow each of these levels of analysis depends essentially on spatial search, so that points can be seen as far from others (outliers), within a surface along with others (such as the ground), nearby or far from others (clustering), forming a distinctive pattern with others (classification), etc.
(By the way, if your particular software approach fails to modularize, to separate out the spatial search problem, you can be assured it underperforms; search is faster if you can search for more things at once, since if you find one while searching for the other, you don't have to do all that searching over again for each.)
In spatial applications quality means precision; precision is not just sensor-limited but compute-limited, and the limits of computation are defined by algorithmic complexity. Hence if you are to succeed via improved quality of spatial systems you must make your own the optimum methods for spatial search.
Imagine a sequence of LIDAR scan frames at times t0, t1, etc. Each point is taken as a distance and two angles relative to the orientation of the scanner. Perhaps tens, hundreds, thousands of 3D points in the camera's frame of reference complete a single scan.
Or imagine a stereo video camera, which similarly provides distance and two angles for each pixel for each frame across a sequence of (equal-timed) times.
There may be a tradeoff between point precision and transform calculation time.
The path task would be to find the small-dimensional transform of camera through space between frames. The transform involves a 3D translation and rotation for movement of camera center and orientation relative to the previous frame of reference, thus a 3D shift and perhaps a unit quaternion for rotation/orientation of its focal axis and "up" normal vector, thus 7 values to estimate between each successive pair of 50-4000 angle/distance frames.
(This whole job is mainly a search-for-the-closest-point problem.)
Code this up, Tom, it'll be fun!
Clearly a competent robot must build internal representations of space and what's in it; and may do so thus: A transitive inverse kinematics tree can integrate "Self" and "Touched" data sources; "Seen" can also be integrated using SLAM techniques above.
At least the former may be enabled across parts designed built and sold by independent suppliers if they follow standards for spatial information representation, processing and communication, such that parts can be joined together and interoperate effectively.