Robots and Space

A robot is a space-bound perception-action loop.

Provided sensors and actuators, it must process and combine its sensory inputs, and generate choreographed actuator-activation signals that tell its parts what to do.

Spatial representation is essential. It must have a representation of empty space, things in space, itself and its parts and their relationships in space, the evolving relations between itself and its parts and other things in space, the characteristics such as density, texture, softness, which must be encoded with space. The physical model of course applies to everything in space, and that spatial representation ought to be close to a learnability of physical action.

Suppose then that you are responsible to build a space-capable system, that is, a system that depends on things in space, so that it has to have, in some sense, some knowledge or awareness of space and what's in space. Indeed your system might be a robot, a drone, a computer vision algorithm, a path planner, an obstacle avoider, a more or less self-driving vehicle, a geographic search or match system, a tracking or targeting system, a hunter seeker.

In spatial applications quality means precision, but precision is not just sensor-limited but compute-limited, and the limits of computation are defined by algorithmic complexity. Those who use substantially worse algorithms, hampered by large and unnecessary inefficiencies, are likely to be left behind, even to be destroyed in warfare by those who use better ones.

Typically your system will take observations or make measurements yielding data about what is where. In general such systems can carry out several levels of analysis and computation:

removal of outliers or denoising,
removal of the ground (typically in ground vehicle systems),
clustering,
classification,
etc.

Each of these levels, modules, algorithms, has to take regions or points and search for and find other points in or near them, or determine that the region is empty. K-nearest neighbor, for example, is a search across the entire observed universe to select the K other points that are closest. (Out of 1M points, it could be a long search, especially if the dimensions are encoded separately.)

aNow each of these levels of analysis depends essentially on spatial search, so that points can be seen as far from others (outliers), within a surface along with others (such as the ground), nearby or far from others (clustering), forming a distinctive pattern with others (classification), etc.

(By the way, if your particular software approach fails to modularize, to separate out the spatial search problem, you can be assured it underperforms; search is faster if you can search for more things at once, since if you find one while searching for the other, you don't have to do all that searching over again for each.)

In spatial applications quality means precision; precision is not just sensor-limited but compute-limited, and the limits of computation are defined by algorithmic complexity. Hence if you are to succeed via improved quality of spatial systems you must make your own the optimum methods for spatial search.

SLAM

Space is fundamental to SLAM (Simultaneous Localization and Mapping).

Imagine a sequence of LIDAR scan frames at times t0, t1, etc. Each point is taken as a distance and two angles relative to the orientation of the scanner. Perhaps tens, hundreds, thousands of 3D points in the camera's frame of reference complete a single scan.

Or imagine a stereo video camera, which similarly provides distance and two angles for each pixel for each frame across a sequence of (equal-timed) times.

There may be a tradeoff between point precision and transform calculation time.

The path task would be to find the small-dimensional transform of camera through space between frames. The transform involves a 3D translation and rotation for movement of camera center and orientation relative to the previous frame of reference, thus a 3D shift and perhaps a unit quaternion for rotation/orientation of its focal axis and "up" normal vector, thus 7 values to estimate between each successive pair of 50-4000 angle/distance frames.

Camera Tracking Algorithm

Overview: Overview: We assume that observations in space across successive scans have some overlap and physical coherency. Then, given an initial frame of reference and data and successor frame's data, find a set of nearest point pairs between the frames; use their differences to estimate the camera shift and rotate parameters; then repeat until convergence. Steps:

Starting in the original reference frame at t0, call it tA, take as given a list of points according to angle and distance from there, re-encoding them as Cartesian 3D points relative to the t0 space.
Next, the successor frame at time tA+1, call it tB.
Re-encode tB's frame of angle/distance values to 3D cartesian values, as if the camera hadn't moved in the last frame, that is, translated/reoriented according to the accumulated translation/reorientation path from t0 to tA.
Set tA-tB translate and rotate variables to null values.
Next, for each point in tB, find the nearest point in tA.
(This whole job is mainly a search-for-the-closest-point problem.)
Next, do a least-squares distance minimization across that set of tA-tB point pairs, as a function of the single transform encoding frame-to-frame camera translation and reorientation. Some setting of transform variables will minimize the sum of nearest-pair distances. Let that be the camera transform tA-tB.
Back out the history transform from the tB data so it is all in the t0 reference frame space, which is your updated map.
Add this tB data to the mapped universe of points.
A few matrix multiplies and you'll have your least-squares estimates for the camera path.

This process should operate in real time for real-time control.

Code this up, Tom, it'll be fun!

Robotic Integration

Robotic cognition of spatial relations and movements in a spatial environment with perhaps moving, perhaps antagonistic or cooperative others, requires integrated spatial representation for all those aspects. Combining touch or direct-contact sensor information with internal spatial proprioception of anatomic positioning and with external remote perception using visual or other means, implies convergence of a variety of frames of reference. Data taken from different frames from different body parts and over sequences of time must be combined in algorithms which tell the body what to do for given purposes in given environments with evolving counter-parties. Some data points may be considered "Self" since they are internal; others "Touched" which are in direct contact; yet others "Seen" (though they may also be "Heard", in principle), but all should be projected into a single spatial map of what is going on, if a robot is to interact properly with its environment.

Clearly a competent robot must build internal representations of space and what's in it; and may do so thus: A transitive inverse kinematics tree can integrate "Self" and "Touched" data sources; "Seen" can also be integrated using SLAM techniques above.

At least the former may be enabled across parts designed built and sold by independent suppliers if they follow standards for spatial information representation, processing and communication, such that parts can be joined together and interoperate effectively.

	Your thoughts?
Name⁺:
Email^*:	(will not be shared or abused)
Comment:
	Feedback is welcome.

Robots and Space

SLAM

Camera Tracking Algorithm

Robotic Integration

Copyright © 2023 Thomas C. Veatch. All rights reserved. Created: September 12, 2023; Modified August 7, 2025

Copyright © 2023 Thomas C. Veatch. All rights reserved.
Created: September 12, 2023; Modified August 7, 2025