Fast and stable direct relative orientation of UAV-based stereo pair

Recent advances in a UAV’s low cost direct geo-referencing utilizing a navigational grade of a GPS/GNSS board and an Inertial Navigation System (INS) mounted on the aerial platform have been used for numerous applications such as monitoring and deformation [1]–[4], forestry and agriculture [5]–[7], mining [8]–[10] , civil engineering [11]–[14], 3D city model and archaeology [15]–[17], and so on. Both devices provide complete navigational solutions such as velocity, attitude and position to safely guide a flying UAV to complete an image acquisition mission. During the entire UAV flight, however, the GPS navigational solution offers relatively consistent accuracy but it is lacking of attitude determination [18], [19]. While images taken from UAV mission are increasingly applied for a wide range of surveying and mapping related applications [16], [20]–[22], they need a relative pose information between subsequent overlapping images so that further data processing workflow such as image matching [15], [23]–[25] and structure from motion [7], [26], [27] can be performed. Considering paramount importance of the relative pose between two overlapping images, this paper outlines coplanarity condition existed in a stereo configuration to determine the RO parameters between them. Then, it follows a practical implementation of the RO method when additional 3D information ARTICL E INFO ABSTRACT

Coplanarity-based relative orientation (RO) is one of the most crucial processes to obtain reliable 3D model and point clouds in Computer Vision and Photogrammetry community. Whilst a classical and rigorous procedure requires very close approximate values of five independent parameters, a direct method needs additional constraints to solve the parameters. This paper proposes a new approach that facilitates a very fast but stable and accurate solution from five point correspondences between two overlapping aerial images taken form unmanned aerial vehicle (UAV) flight. Furthermore, if 3D coordinates of perspective centers are available form geotagged images, rotational elements of the RO parameters can be quickly solved using three correspondences only. So it is very reliable for a provision of closed-form solutions for the rigorous methods. Our formulation regards Thompson's parameterizations of Euler angles in composing a coplanarity condition. Nonlinear terms are subsequently added into a stereo parallax within a constant term under a linear least squares criteria. This strategy is considered new as compared with the known literatures since the proposed approach can find optimal solution. Results from real datasets confirm that our method produces a fast, stable and reliable linear solution by using at least five correspondences or even only three conjugate points of geotagged image pairs. encoded in an exchangeable image format (EXIF) form is available on each captured image taken during the UAV mission.
A relative orientation process recreates relative translation and angular relationships between two successive overlapping images that existed at the time of photography [28]. A relative orientation consisting of translation and rotation in the stereo images is a prerequisite to retrieve 3D structures from images. The most fundamental problem in geometric computer vision and photogrammetry is a determination of the RO or relative pose from point correspondences between two overlapping images. Numerous works for recovering the position and orientation of stereo images have been shown. Early attempts to reconstruct a scene from the position and rotation from image correspondences utilize projective theory on a coplanarity constraint [29]- [32]. A solid theoretical foundation about projective significance of the RO matrix was recognized, known as the Fundamental/Essential matrix for describing the geometry of an image pair. Algebraic projective geometry was used to generate polynomial system iteratively to yield an optimal and exact Essential matrix. This method uses 8 point correspondences to the approximate values, then enters into the least squares adjustment with linearized version of the system. One major drawback is the low stability of the system and its use of Gauss-Newton elimination being susceptible to all types of perturbations [33].
Seminal achievements of the scene reconstruction based on this matrix are due to [34], [35]. They pioneer a further work on the relative orientation improvements. Different strategies and different numbers of minimal correspondences are used to solve the intractable problem using this simplest matrix. For examples, the use of orthogonalization algorithm [36], eigenvectors and eigenvalues [37], singular value decomposition (SVD) [35], quaternions [38], and normalized image coordinates [39] increases the stability and reliability of the resulting matrix. Although an existence of the Essential matrix can be determined with a minimum number of four or fewer point correspondences [40], the most stable and linearly unique solution is given by [39] which use eight point correspondences or more. Other methods using five to seven point correspondences are outlined in [41]- [43].
Other methods of determining the RO are by exploiting a coplanarity condition of the two adjacent images [28] as shown in Fig.1. The geometry of the point correspondence reveals the geometric relations between the scene point and the image points. Assuming the scene point P is static and two aerial images are taken from two successive exposures with a calibrated camera mounted on the flying platform, the RO is described by the two independent sets of exterior orientation parameters (i.e. 6 parameters of each image and thus 12 parameters altogether). Since the scene point object will be reconstructed up to a spatial similarity transformation, which is comprised of seven parameters (i.e. three translations, three rotations, and one arbitrary scale), it means that only 5 parameters out of the 12 total exterior orientation parameters are determinable. This situation is realized by fixing one image (i.e. left image) such that the pose of these images is relatively oriented with respect to each other. Hence, the object points can be reconstructed at an arbitrary scale only up to spatial similarity transformation, or so called a photogrammetric model. Thus, the rotation matrix 2 of the right image and the direction of the baseline b connecting two projection centers of 1 and 2 are chosen as the RO parameters.
A direct method to determine these parameters based on the coplanarity constraint is reported in [44]- [47]. It is derived by direct linear transformation (i.e. DLT) from the coplanarity condition equation and this method is linear with respect to the 8 unknown parameters [46], [47]. A direct solution for these parameters can be achieved without knowing any approximate values. However, a duality problem of a solution is still exhibited [47]. Attempts to improve the solution are also reported. An alternative approach by imposing four non-linear constraints by deriving inherent orthogonal properties of rotation matrix [45] improves the solution. Another attempt is by adding seven constraints to control and adjust the solution parameters [44]. Six constraints are deduced from the orthogonality of the rotation matrix, and the last one arises from the decomposition of the baseline. Furthermore, an attempt to incorporate a RANSAC algorithm in the method to filter out gross errors in the RO solution is also reported by [48].
Instead of decomposing the essential matrix into the rotation and translation parameters of the pose in the direct method, the rotational and translation elements are directly computed into the coplanarity condition. If the epipolar plane defined by the vectors of b, 1 and 2 , which also contain the image point 1 and 2 , the computational solution of the RO utilizes a condition that an object point P in an arbitrary coordinate system of the photogrammetric model and the two perspective centers of 1 and 2 must lie in a plane (i.e. a coplanarity constraint). The coplanarity equation is a scalar triple product of a volume of a parallelepiped of these three vectors. If the base of the parallelepiped defined by the any first two out of three vectors and its height by the remaining one, the volume of parallelepiped will be zero if the third vector lies in the plane of the base, making it coplanar with the first two vectors. The direct linear solution of this method uses an extensive algebraic manipulation [46], [47], however a duality of the solution arises due to perturbations in image point coordinates [47], [49]. To remedy the result, further constraints are applied to eliminate the influence of over parameterizations of the direct relative orientation model [44], [45], [48]. These improved methods are claimed to be more suitable for UAV flying at low altitudes [48]. Therefore, we propose a hybrid method for solving the RO parameters between two overlapping aerial images using point correspondences from a stereo pair of nadir view looking images taken by the flying UAV platform and the Euler angles parameterization. We also propose a practical method to compute the relative angular rotations between the geotagged images based on the previous finding by using three point correspondences only. This paper is organized as follows: firstly, the coplanarity condition is elaborated further using Thompson's parameterizations of Euler angles to derive a linear and iterative solution. Secondly, a practical implementation is shown using a derived formula to compute a closed form solution of the relative rotational elements between two geotagged images. Then examples of direct and rigorous computational procedures are given.

Method
In this section, we illustrate two methods of computing the RO parameters using the Euler parameterization. In the first part, we discuss a technique to retrieve the five RO parameters using five or preferably six point correspondences. In the second part, we discuss a method of utilizing geotagged images to compute three rotational elements of the RO parameters from three point correspondences only. The coplanarity condition in Fig. 1 implies a situation in which the object point P and its corresponding image point 1 and 2 on two overlapping images are located on the same plane with the baseline vector b. When this condition is achieved, the vector 1 will have an intersection with the vector 2 , and these vectors together with the baseline vector b will be coplanar and a scalar triple product of them is zero. The mathematical model in a determinant form of one pair of corresponding point is given by [50]: Equation (1) is the coplanarity condition in the form of a scalar triple product of the volume of a parallelepiped. Its determinant form consists of three vector components of b, 1 and 2 . The baseline vector of b is extracted by subtracting two perspective centers of 1 and 2 of Cartesian coordinates of the left image and the right image respectively as follows, The 1 , 1 and 1 as well as the 2 , 2 and 2 are the perspective center of the left and right image respectively, whereas 1 and 2 represent the vectors in the model or object space system. If 1 and 2 represent vectors in image space, a relationship between the vector in the image space and the vector in the model space is as follows, by assuming a unity scale factor: A 3 by 3 rotation matrix rotates the image space vector into vector in the model coordinates system whose elements constitute the exterior orientation parameters with rotation angles of ω, ϕ, κ [51]. It is verified that the vector equation of (1) is the same as a matrix equation if a skew matrix is utilized, If the left image is fixed and the origin of the local 3D model is located in the projection center of the left image and oriented parallel to its image coordinate system as in Fig. 1, the exterior orientation parameters can be chosen as 1 = 1 = 1 = 0 , also  1 =  1 =  1 = 0. Therefore the rotation matrix 1 will be equal to the Identity matrix of I and the vector 1 can be reduced to: According to (6), the model coordinate systems are aligned with the left image coordinate system. The baseline vector b is also defined in the model coordinate system and has the base components of , and connecting the two perspective centers of 1 and 2 . Suppose the perspective center 2 is displaced along the baseline toward 1 , it is clear from the Fig. 1 that the vector 2 will still be coplanar with the baseline b and that vector will intersect in a point lying on the line between 1 and 1 . From a relation of similar triangles, the scale of the model will be directly proportional to the length of the baseline. Therefore, the model coordinate system can be scaled by an arbitrary factor depending of the choice of the baseline length. For simplicity, the longest component of the baseline vector is set to a constant value of ′ (i.e. ′ = ⁄ = 1) [28]. As a consequence of these facts, the other two baseline components are adjusted accordingly into ′ = ⁄ and ′ = ⁄ . These divisions mean that a direction of the unit vector of the baseline components remains constant irrespective of the baseline length chosen [28]. Now, three rotation elements only out of five elements of the relative orientation remained. The computational solution of (5) can be simplified into the model coordinate system as follows, The coplanarity equation of (7) has five parameters which are ′ , ′ and the three independent parameters of 2 :  2 ,  2 ,  2 . To preserve a projective collineation and stability of the computation, 3D vectors of conjugate image points of 1 and 2 in image space are normalized by using Euclidian normalization such that the homogenous part has Euclidian norm 1. Assuming both image have an equal principal distance of c as shown in Fig. 1, and since 1 = 2 = , we may divide the coordinates of point 1 and 2 on each image by the c to obtain a normalize image plane coordinates such that: From (8) it is clearly showed that the coordinates of point 1 and 2 are now to be understood as the ratios of the measured image coordinates to principal distance c. These homogenous coordinates of measured image points will be independent of an image measurement unit such as pixels or millimeters. Equation (7) is now The orthogonal rotation matrix 2 has three degrees of freedom out of nine elements. Hence it is possible to express all the nine elements in terms of three independent parameters. There are wellknown methods [52] of imposing orthogonality constraints on a reduction to three independent parameters [33], but these introduce other circular functions of parameters [40] and make the direct solution sets of (7) or (9) harder to solve [53], [54]. Alternatively, the orthogonal matrix 2 is expressed with respect to three independent parameters in the form of a rational algebraic relation only. The usefulness of such a representation depends on the simplicity of the resulting matrix. In photogrammetric community, the representation in terms of Euler angles can be expressed as follows [30], Where Δ = 1 + 1 4 ⁄ ( 2 2 + 2 2 + 2 2 ) and Δ ′ = 1 − 1 4 ⁄ ( 2 2 + 2 2 + 2 2 ) are orthogonal. If 2 in (10) is substituted into (9), and after some algebraic simplifications it becomes [30], Where: Equation (11) expresses the y-parallax of 1 − 2 in terms of the elements of RO in an exact rational algebraic form. The first line of (11) constitutes the linear terms, but the terms having 1 , 2 , and 3 as coefficients are of the second order, while 1 and 2 are the third order terms. The direct solution of (11) can be solved by using at least five point correspondences. The solution must proceed by successive approximation. The approximate values are calculated from the linear terms and substituted into higher order terms.
When the aerial geotagged images are available, 3D Cartesian coordinates can be extracted from EXIF information of each image. Since the base line vectors b can be determined from (2), the values of ′ and ′ can be imposed as constraints on (11). Therefore, the numbers of RO parameters are reduced to the rotational elements only. These remaining three parameters can be solved by (11) using three points correspondences.
The RO parameters in (11) are solved by five pairs of observations on image point correspondences, or preferably at the usual six pairs on which sufficient information occurs to solve unknowns. However if positional information of captured image is available on the geotagged images, the number of image point observation are decreased into a minimum of three pairs. A straightforward procedure is to solve the five RO parameters by least squares adjustment of = , and one pair of image points of observation yields, The matrix in (18) consists of partial derivatives of (11) with respect to the five RO parameters. The 1 is a constant term of (11), whereas S and T comprise the second order terms in (12) to (14) and third order terms in (15) to (16) respectively. Its computational procedures start from obtaining approximate values by ignoring S and T. In the next iteration, the terms of S and T must be regarded as they were part of the constant terms of the 1 . For example, in the second iteration, the value of S can be obtained by utilizing previously approximated parameters in the first iteration, and the new constant terms of 1 are now become: Now the least squares adjustment computation of = is repeated in which the l is pertaining to the 1 + of (19). After new values of parameter x are readjusted, the third order terms are readily available by using the new parameters of the second iteration: Once again, final parameters values are determined by the least squares adjustment computation, but now the constant terms of is directly related with the 1 + + of (20). It is likely that two iterations would be sufficient in many cases, and often it will be possible to neglect T entirely.
If the 3D coordinates of the perspective centers between two overlapping images are known from the EXIF header of the geotagged ones [55] or from other methods [51], [52], [56], these coordinates can accelerate a provision of the closed-form solution or approximation of the RO parameters by imposing a constraint on the baseline vectors. From the EXIF reading the values of , and can be computed. Therefore the ′ = ⁄ and ′ = ⁄ become known parameters. Hence the first iteration of the linear terms on the matrix and 1 can be expressed as, Its computational procedures for the second order and third order terms are identical with that of (19) and (20) respectively:

Results and Discussion
Field observations were carried out in Malang city. An array of around 20 ground control points (GCPs) is established from a white concentric ring surrounded with dark background for point correspondences (Fig. 2). To avoid false matches and to facilitate a possible highest accuracy of image coordinate measurements of GCPs on stereo images, least squares image matching are performed to seek the best matches on stereo images [23]. Furthermore, the image coordinates of matched points of the stereo images are represented in Table 1. The images are photographed with a fixed focal length of 35mm and the image coordinates are not corrected for the principal point offset or for the lens distortions. Table 1 shows image coordinates of the left and right images in a metric unit instead of pixels. Fig. 2. Some of the GCPs on the Field as points correspondences from cropped stereo images From the geotagged stereo images (Fig. 2), approximate geographical coordinates of both images' perspective center are readily available decoded on the EXIF format header on each image and they are used to determine the baseline vectors between the two images. Result of the perspective center coordinates reading on each image is shown in Table 2. It shows a converted Cartesian coordinates from the geographical coordinates. The conversion is performed by using widely available open source software. The projective center coordinates of each image are then utilized to calculate the baseline vector components between two images as in (2). If the GCPs are surveyed using geodetic type of GPS, the obtained geographical coordinates can be verified using space resection methods [51], [52], [56] that needs at least three GCPs appeared on both images. The baseline vector is shown in the second column  To uncertain a reliability and stability performance of the two algorithms of (18) and (22), a series of experimentation is conducted using randomly selected ten points of correspondences between two images as shown in Table 3. The first algorithm of (18) consists of five parameters and the second algorithm of (22) comprises only three rotational parameters which are to be evaluated. An evaluation of the first algorithm utilizes a permutation of 5, 6, 7, 8, and 9 points out of ten points of correspondences and the second one uses a permutation of 3 to 9 points respectively. Each set of permutation describes a different spatial distribution of points on the images. C/C++ codes are developed for both algorithms, where a benchmark method and its codes are available in [57] that conform the classical photogrammetric dependent RO. All the source codes and data are available on GitHub [https://github.com/edwin-tn/ro-th.git]. A reliability test of the algorithms uses a full set of 10 correspondent points, and then results of each one are compared against a benchmark illustrated in Table  3 and Fig. 3. It depicts discrepancies of each corresponding parameters of the 5 and 3 parameter algorithms against the benchmark's ones. Table 3. Results of the Benchmark, 5-Parameter Algorithm, and 3-Parameter Algorithm

b'z (unitless)
-0.047000 -0.047286 -Considerably attention must be taken care. Since both 5-parameter and 3-parameter algorithms are based on fast linear solutions of (18) and (22) respectively, they do not need approximate parameter values comparing to that of the benchmark method, and better still give very close solutions to the final ones. Another advantage is that rotational elements of the RO parameters could be calculated with reasonable accuracy by using only with a rough or unprecise coordinates of the perspective center of images. Overall, it is clearly showed that the rotational elements of both algorithms give very close values to the benchmark. Although the Phi rotation gives a small deviation of about less than 1 degree for the 3 parameter algorithms, it can be understood that since imprecise geotagged coordinates constraint the baseline vector. On the other hand, the normalized baseline units vectors of ′ and ′ are practically equal between those of the benchmark and the 5-parameter algorithm. Fig. 3. Comparison results of the 5 and 3 parameter algorithms against a benchmark's method A stability performance of the 5-parameter and 3-parameter algorithms is investigated by choosing correspondent points randomly altogether in sequences, numbers and spatial locations on stereo images. In the first group, it starts form a minimum number of input points, which is five points of correspondences for the 5-parameter algorithm and is three points of correspondences for the 3parameter algorithm respectively. In the next group and after that, the numbers of input points are incrementally increased by one point up to nine points of correspondences. Hence there are five groups of correspondences for the 5-parameter algorithm and are seven groups of correspondences for the 3parameter algorithm. Within each group, the numbers of selected points are randomly permutated to give a locational variability. For example, in the group of a three point there would be 120 sets of input points as it is a result of a combination of 3 out of 10 points. In this research, however, not all combinations on each group were performed.
Let's start with a three point and a four point sets of the 3-parameter algorithm since the 5-parameter one is not applicable for this case. These sets of points do not meet a requirement of the minimum numbers of points for the 5-parameter algorithm. Fig. 4 illustrates that averaged deviations of the omega (), phi (), and kappa () rotation from the benchmark's solutions are around 0.28, 0.8, and 0.04 degrees respectively for a three point data set and it yields approximately equal results for a four point data set that are 0.26, 0.6, and 0.08 degrees for the same sequences. These results are unexpectedly outstanding since low accuracy navigational grade GPS data were used to determine the baseline vectors between two stereo images. Nonetheless, erratic curves of the Phi rotations on both data sets suspect contaminated data with noises. Around 25% of the Phi rotation discrepancies surpass 2 degrees. However these numbers of deviations on an average of 0.8 degrees for 3 point data set and of 0.6 degrees for 4 point data set are still reliable closed-form solutions to compute the final solution rigorously using the iterative least squares adjustments. For the 5-parameter one, the numbers of point set start from five points to uniquely determine all the five RO parameters. Fig. 5 to Fig. 9 depicts discrepancies of the RO parameters between the 5-parameter algorithm against the benchmark's parameters as well as the 3-parameter algorithm against the benchmark solutions. Both algorithms give smaller numbers of deviations when the numbers of correspondent points are increased. For the 5-parameter algorithm, the deviations of the normalized unit vector components of ′ and ′ against those of the benchmark are practically not significant to the increased numbers of input points of correspondences. On the other hand, the rotational elements of the 5parameter algorithm are closer to those of benchmarks than that of the 3-parameter algorithm due to the usage of a low accuracy, navigational grade GPS (Fig. 10). Throughout results of the RO parameters of both algorithms are presented in Table 4 and Table 5.  Table 4 and Table 5 clearly indicate that the 5-parameter algorithm gives better result than that of the 3-parameter one against the benchmark values in the Table 3. A tangible benefit of the 5-parameter algorithm is that it can determine very close values of the RO parameter without a need of approximate ones. This algorithm presents no complicated algebraic manipulation or an exhausted polynomial root finding. Within two or three iteration cycles, the solutions are determinable. Normally, the solutions are existed when the minimum numbers of five point correspondences are reached. The solutions are relatively stable for different numbers of correspondences and for different locational distribution of points on images. An advantage of the 3-parameter algorithm is that it needs only three correspondences if geotagged stereo images are available. Constraining the baseline's vectors of the overlapped images yields faster computation process on the RO rotational elements. However, due to the utilization of the low accuracy of GPS receiver, the resulted RO parameters' accuracies are slightly downgraded, but it still closed enough to the benchmark's ones.  To evaluate computational times needed to solve the RO parameters of the proposed algorithms, Table 6 illustrates a computational speed of each proposed one to calculate the RO parameters of each group's point set. All algorithms are computed on the Pentium-i7 CPU and they are not optimized for CPU or GPU parallel processing capabilities. It is shown that computational speeds of our proposed algorithms are much faster than that of the benchmark algorithm, especially for the 3-parameter algorithm. Both of our methods are significantly reduced the processing times and the methods are able to give very close approximations to the final values of the Benchmark algorithm. Table 6. Computational times between each algorithm To sum up, two new approaches to determine the RO parameters of the stereo images have been elaborated. Based on conducted field tests, both methods are fast and stable to compute the parameters without approximate values. This can be achieved through a parameterization of the rotation elements according to [30], and through implementing a step by step computation of the nonlinear terms in an iterative fashion like a least squares adjustment computation procedures. Random noises still affect our methods particularly for the 3-parameter algorithm, but their results are still reliable as a closed-form solution.

Conclusion
The most important achievement of this paper is to demonstrate two new approaches of computing the relative orientation parameters between overlapping aerial images from the UAV flight mission. Both algorithms are based upon a parameterization of Euler angles in composing a coplanarity condition and an incremental parallax minimization by treating nonlinear terms under a linear least squares criteria. These new proposed approaches are fast, stable and reliable to approximate initial values for a more rigorous solution of the relative orientation computation. Constraining the coplanarity condition by the navigational grades of GPS coordinates can speed up the computational process without sacrificing the results. New prospects of further studies are opened into a new horizon for the Computer Vision and Photogrammetry community such as implementing these algorithms for a gross error detection using RANSAC like methods, as well as for multi view stereo methods for generating 3D point clouds.