Sensing Linear and Angular Change Through a Small Optical Window

Tim Poston     Manohar Srikanth     Prabhakar Vaidya
 

National Institute for Advance Studies (NIAS)

Indian Institute of Science (IISc)

Abstract:

We present an algorithm to sense translations and rotations of a sliding optical window with respect to a nearby observed surface. We fit a multi-frame descriptor of smooth rigid motion to image subsequences, by using it to create transformed images from a reference one, comparing these to actual views, and applying a Newton's method to minimizing the sum of squared differences, solving iteratively for the polynomial coefficients in the motion descriptor.
Our simulation results, using synthetic test images and real CCD Camera images, indicate that the algorithm very robustly estimates Frame-to-Frame motion parameters with sub-pixel precision. The rotation estimate, besides adding a degree of freedom to user interaction, supports a directionally consistent XY output that XY-only sensors cannot provide.

Introduction

The standard mouse, from [Engelbart] onward, reports movement with two degrees of freedom: sideways motion in any combination with forward-back.
It fails to report rotation of the mouse, and the meanings of 'sideways' and 'forward' rotate with the mouse, as they do in current optical mice. Early optical mice [Kirsch] used an XY coordinate system embedded in the pad, and would not work right when rotated: this enforced an unrotating reporting frame, by restricting the user. In our paper we discuss the user advantages (in 2D and 3D applications) of removing these limits, and [Dinpuii Poston Srikanth] analyses tests with users. This paper addresses the technology.

Our work, for a three-degree-of-freedom mouse interoperable with standard mice, includes means to track rotation by comparison of multiple sensors, as in [McKenzie et al.] and [Bohn], and also by better analysis of the data from the lens-and-camera hardware used in current XY-only optical mice. Upgrading only the algorithm and processing components means that existing mouse form factors, internal layout, button schemes, moulds and production lines can be used unchanged. The very similar manufacture, with little retooling, will preserve both cost and the breadth of user choice. Here we describe our algorithm less legalistically, and report early results from it.

           
                (a)                                                   (b)

A typical translation and rotation task: Green star to be overplayed on Red one. (a) is done with a standard XY-Mouse, (b) is done with XYθ Mouse (Mushaca). (a) requires significantly more time and effort than (b).

Available data


           


   As the mouse sensor moves over (far left) over a wooden desktop,
   it 'sees' a changing image, usually with both colour variation and shadows from sidelit irregularities in the surface. The image illustrated at left, however, shows far more detail than the usual 18×18 CCD array can capture. The input for motion estimation is a changing 18×18 pixel array ; below is an enlarged still from real mouse data.
Industrially conventional processing (as in, e.g., [Agilent] sensors) identifies common features in these images to estimate the direction and amount of mouse movement.  As the figure at right shows, 'feature' here cannot be as sharply defined as, for instance, the corners of a ∇, which line fitting can locate with sub-pixel precision. A great deal of ingenuity is needed to detect features robust enough to be identified from one image to the next, since each pixel is a weighted optical average of brightness levels over a small region of the pad, complicated by noise. One cannot (for instance) simply seek repeated grey levels.

 


Typical image from the CCD sensor

In consequence, features are located to a precision of at most one pixel diameter. This is quite adequate for a classical mouse with a 1mm window moving at 10cm/sec and sampled every millisecond, which corresponds to a displacement about two pixels per frame. With a screen refresh rate of 60/sec, the cursor position needs to be updated only about every 18 sensor images. Summing this many integer pixel counts for displacements δX and δY suppresses enough quantization noise for a smooth experience by the user.

However, for rotation this sensitivity is not enough. A mouse in the hand, rotated for close control, may move as slowly as 10°/sec, and the changing angle should be smoothly reported. The difference between a pure (δX, δY) translation and a turning motion may be seen as a rotation about (say) a corner of the image. No point in an 18×18 array moves in one millisecond's 10°÷1000 rotation by more than a hundredth of a pixel; nor, therefore, does one feature. Feature locations defined to within one pixel are adequate for translation detection, but suppress rotation. After the ≈100 steps needed for a rotation to change a feature's location (relative to pure-translation motion) by a whole pixel, it is long gone from the sensor window.

We estimate motion by bypassing the idea of features: use of the whole image avoids discarding data. We also compare multiple successive images for each estimate, rather than each image with the previous. This exploits more pixel data, suppressing motion jitter by fitting smooth motion functions, as well as averaging out the effects of per-pixel noise. It also allows estimates that go beyond the first derivatives (or first differences) of the motion. The result is a close fit to the actual motion, including rotation.

We will post a more complete account of the algorithm in a mathematics-friendly format, after a longer series of experiments varying the subsequence length, the convergence citerion, and the polynomial degree of the motion descriptor. Early, un-optimised results are illustrated below.

The rotation estimate is not merely sufficient for the use of rotation as a degree of freedom in input.
It permits δX and δY to be transformed to something more useful.
Move your finger quickly from point A to point B, and from there to C.
You can return fast and directly to point A — sight unseen — with high accuracy.
If your hand moves a mouse ABCA, the cursor does not return to its exact starting point.
You have to steer it in by eye, with a delay loop of at least 200 milliseconds.

The cursor adds the mouse's sideways motions, and moves itself by that much X.
It adds the mouse's forward-back motions, and moves itself by that much Y.
Your hand always turns as it moves, so these are not fixed directions!
Put your elbow on the table, hold the mouse straight outward, and pivot on your elbow.

 

The mouse always moves exactly sideways, so the cursor moves only in X:

(Adjust the Windows ControlPanel⇒Mouse⇒PointerOptions⇒PointerSpeed to  Slow, to see this best.)

Conversely, if the mouse moves straight but rotates as it goes, the cursor's path curves.
Your hand usually does turn the mouse, at least a little.
This does not make it easy to aim long motions of the cursor.

If — and only if — the rotation is known, simple algebra turns 'sideways' and 'forward' back to a geometrically consistent X and Y. (Exactly which X and Y depends on the starting attitude of the mouse, but does not change thereafter.) Standard XY estimation algorithms cannot support this, without doubling the sensors. We estimate ( δ(sideways), δ(forward), δ(θ) , and can adjust ( δ(sideways), δ(forward) ) to (δXδY) at trivial cost.

Results

We show here a sample from simulated data. Fixing a trajectory (X(t), Y(t), θ(t), we moved a virtual window across a 'desk surface' image and resampled it, approximating the CCDs' blur of finer-scale detail with a kernel from [OpenCV]. (Our results from a real moving CCD camera are clearly similar, but we do not yet have precise measurements of 'ground truth' position like those used in the comparisons below. These will follow shortly.)
Moving the window on a circular path, we separately rotated it, to avoid the special 'apparently constant direction' case of the diagrams above.   Initially slowed time imitates the effect of gradual acceleration from stasis (in pixels/frame at 1frame/millisecond, real mouse acceleration is indeed smooth and gradual), so that we could start with stasis as an initial motion guess. The smooth motion is realistic on the millisecond scale, since human hand tremor is around 12Hz, and makes 'previous fitted descriptor' a good initial guess for each successive frame's fitting process.
The red curve shows the path (X(t), Y(t)) as a locus in (XY)-space, suppressing time t, and the red arrows show the direction of θ at the corresponding points.   The blue curve shows the result of accumulating (δ(sideways), δ(forward) ) as if sideways and forward were the same as X and Y.   We have given θ a more-than-360° turn (exaggerating what is likely with a mouse), so the two curves are very different.

The blue arrows show the θ values estimated for the same times t.   With time t suppressed, the quality of match is not clear.   However, we use these θ values to rotate (δ(sideways), δ(forward) ) into (δX, δY) motions, and sum them to give the (XY) motions below.

The accuracy of the reconstruction of (X(t), Y(t)) becomes obvious.
Note that the spatial units are in input-image pixels (for an 18×18-CCD 1mm window, about 50μm across.) We applied no smoothng filter beyond the intrinsic smooth motion fitting of the algorithm itself. Estimation at the integer pixel count level would have given jagged curves, as now a δX increment of one pixel occurs, now δY.   Our algorithm visibly yields far finer resolution than this, in both space and time, from the same data now used for regular mice.
The current generation of gaming mice announce 1600 to 2500 "dpi" precision, using laser light, three or more times the number 18×18=324 of CCDs, and up to 6000 frames/sec, with a corresponding price multiplier.   Our algorithm achieves at least the same ≈1/100 mm precision, with at most the addition of a custom chip.
It is likely that a custom chip will in fact be needed.   A single iteration step of our current (completely unoptimised) implementation takes about 5 milliseconds on a 2.66GHz Intel XEON, and we are currently using four iterations per frame.   At kHz frame rates, we need to be twenty times faster.   Code optimisation will provide some of this, as will experiments to show how few images and how few polynomial coefficients can suffice for good results, using real data.   The algorithm, moreover, is straightforward to parallelise (the matrix entries corresponding to each polynomial coefficient can be handled separately, as can entries comparing different images to the reference one, without limiting the options for vector pipelining).   It is thus a standard development task to create a data flow chip that can sustain it at kHz frame rates.   Quantity production can then limit the unit cost.

Conclusions

Our algorithm fits a motion-descriptor to all the data (not just selected features) of a short sequence of images from a CCD camera array.   It achieves sub-pixel precision in motion reporting, and includes rotational data, which can be used both for its own value and to make the reported 'sideways' and 'forwards' directions consistent within a motion of the mouse.   It is highly parallelisable, allowing implementation on a chip that can process a flow of images at the 1kHz necessary for a 1mm-window optical mouse.   No hardware change in an existing optical mouse, other than the introduction of such a chip, is necessary.  The existing production line for a mouse with any of the different form factors, button configuration, etc., can thus be used without retooling or an increase in cost.

This appears to be an excellent sensor algorithm on which to base the hardware for
  • a high-precision, economic XY  mouse

  • a consistent-direction XY  mouse, improving usability of now-existing XY  based software.

  • a `mushaca' mouse, improving usability of now-existing XY+scroll  based software,
    and enabling a new generation of user interactions in 2D and 3D.


 

References

  1. Agilent sensor spec
  2. Agilent white paper
  3. Engelbart D G, X-Y Position Indicator for a Display System, US Patent 3,541,541, issued 19 Nov. 1970.
  4. Kirsch, S, US patents 4,364,035 (1982), 4,390,873 (1983), 4,546,347 (1985).
  5. Open Computer Vision library.
  6. Poston T & Srikanth M, [2005], Computer Input Device Enabling Three Degrees of Freedom and Related Input and Feedback Methods.
  7. Poston T, Srikanth M B, Dinpuii H K Fente, 3D manipulation with a 3DOF mouse, in preparation.
  8. Poston T, Srikanth M B, [2006] Using a mouse that reports consistent-direction translation, and rotation,in preparation.
  9. [URL for the mushaca web presence]
  10. MacKenzie I S, Soukoreff W R, & Pal C, A two-ball mouse affords three degrees of freedom, in Extended Abstracts of the CHI '97 Conference on Human Factors in Computing Systems, ACM, 1997, 303–304.
  11. Bohn D, Pointing device having rotational sensing mechanisms, US Patent No. 5,561,445.
  12. Zhai S & MacKenzie I S, Teaching old mice new tricks: Innovations in computer mouse design, Proc. Ergon-Axia '98 — the First World Congress on Ergonomics for Global Quality and Productivity, 80–83, 1998.