Segmenting letters, words and paragraphs.
In this example, a digitized text is processed to identify the letters, words and paragraphs. This demonstration uses only the ialabel function with different connectivity parameters.
The text image is read.
1. import ia870 as mm 2. 3. f = adreadgray('stext.tif') 4. adshow(f)
First, label the letters.
The letters are the main connected components in the image. So we use the classical 8-connectivity criteria for identify each letter.
1. fl = mm.ialabel(f, mm.iasebox()) 2. adshow(mm.iaglblshow(fl)) 3. print 'Number of labels:',fl.max()
Number of labels: 262
Second, label the words.
The words are made of closed letters. In this case we use a connectivity specified by a rectangle structuring element of 7 pixels high and 11 pixels width, so any two pixels that can be hit by this rectangle, belong to the same connected component. The values 7 and 11 were chosen experimentally and depend on the font size.
1. sew = mm.iaimg2se( mm.iabinary(ones((7,11)))) 2. adshow(mm.iaseshow(sew,'EXPAND')) 3. fw = mm.ialabel(f,sew) 4. adshow(mm.iaglblshow(fw)) 5. print 'Number of labels:',fw.max()
Number of labels: 44
Finally, label the paragraphs.
Similarly, paragraphs are closed words. In this case the connectivity is given by a rectangle of 35 by 20 pixels.
1. sep = mm.iaimg2se( mm.iabinary(ones((20,35)))) 2. fp = mm.ialabel(f,sep) 3. adshow( mm.iaglblshow(fp)) 4. print 'Number of labels:',fp.max()
Number of labels: 3
- ialabel - Labeling of connected components