Mutual information is a probability theory that quantify the mutual dependence between random variables. Generally, mutual information measures the shared content by input variables. Independents variables have mutual information equal to zero, because one variable does not give any unknown information about another.
Some variations of original mutual information have been proposed to suit various applications. Many applications, for example, require a metric to measure the distance between points, also known as the variation of information. Another example is the conditional mutual information that it is useful to express the mutual information of two random variables conditioned on a third.
On image processing, in order to calculate the mutual information criterion, it is necessary to determine the joint histogram between two input images. This feature space is a two-dimensional plot showing the combinations of grey values in each of the two images for all corresponding points . The joint histogram changes with variation of images alignment.
The feature space of correcly registered images presents some peculiarities. Corresponding anatomical structures overlap induces some clusters in the joint histogram for the grey values of those structures.
Method aplication - Image Registration
Mutual information based methods are largely used on inter modality image registration. Inter modality registration is necessary to correct differences between images from different types. This task is more difficult than the intra modality registration (compares images from the same modality taken from different subjects) because there is no direct relation between the intensities of the two images. Mutual information aim to minimize the information content of the difference image. The mutual information method can be defined in terms of entropies. Therefore, to maximize their mutual information I(A;B), it is necessary to minimize their joint entropy H(A;B). Entropy can also be viewed as a measure of uncertainty. Different definitions of entropy can be chosen, when using a definition of entropy based mutual information. Furthermore, several adaptations of mutual information have been proposed: normalization with respect to the overlapping part of the images and inclusion of spatial information .
Viola et al.  proposed a method based on maximixation of statistical dependence of voxel intensities in the images to register. Mutual information criterion only assumes that data is stationarity, does not assume specific relations between voxel intensities in different modalities. This information-theoretic criterion can be applied to any pair of modalities without modifications.
Thévenaz et al.  present a method modification with a highly-efficient optimizer for the maximization of mutual information. It is designed to converge in very few criterion evaluations. The optimizer takes advantage of the differentiability of the criterion to get a global understanding of the behavior of the criterion near the optimum.
Some disadvantages of mutual information method can be found in Peter et al. . The paper concludes that the method is computationally very expensive, as well as being sensitive to the interpolation procedure. The minimum found can be a local one, and not be the correct/optimal one.
 - P. Viola and W. M. Wells III, “Alignment by maximization of mutual information,” in Proc. 5th Int. Conf. Computer Vision, Boston, MA, June 20–23, 1995, pp. 16–23.
 - P. Thévenaz and M. Unser, “Optimization of Mutual Information for Multiresolution Image Registration,” in IEEE Transactions on Image Processing, Vol. 9, No. 12, December 2000.
 - Peter J. Kostelec and Senthil Periaswamy, “Image Registration for MRI,” in Modern Signal Processing, Vol. 46, 2003. 20–23, 1995, pp. 16–23.
 - Josien P. W. Pluim, J. B. Antoine Maintz and Max A. Viergever, “Mutual information based registration of medical images: a survey” in IEEE Transactions on Medical Imaging MEDICAL IMAGING, Vol. XX, No. Y, 2003. 20–23, 1995, pp. 16–23.