contentRecog package

Framework for recognition of content; OCR, image recognition, etc. When authors don’t provide sufficient information for a screen reader user to determine the content of something, various tools can be used to attempt to recognize the content from an image. Some examples are optical character recognition (OCR) to recognize text in an image and the Microsoft Cognitive Services Computer Vision and Google Cloud Vision APIs to describe images. Recognizers take an image and produce text. They are implemented using the L{ContentRecognizer} class.

class contentRecog.BaseContentRecogTextInfo(*args, **kwargs)

Bases: _ReviewCursorManagerTextInfo

The TextInfo class that all TextInfos emitted by implementations of RecognitionResult must inherit from.

Constructor. Subclasses must extend this, calling the superclass method first. @param position: The initial position of this range; one of the POSITION_* constants or a position object supported by the implementation. @param obj: The object containing the range of text being represented.

_abc_impl = <_abc._abc_data object>
class contentRecog.ContentRecognizer(*args, **kwargs)

Bases: AutoPropertyObject

Implementation of a content recognizer.

allowAutoRefresh: bool = False

Whether to allow automatic, periodic refresh when using this recognizer. This allows the user to see live changes as they occur. However, if a recognizer uses an internet service or is very resource intensive, this may be undesirable.

autoRefreshInterval: int = 1500

How often (in ms) to perform recognition.

getResizeFactor(width: int, height: int) int | float

Return the factor by which an image must be resized before it is passed to this recognizer. @param width: The width of the image in pixels. @param height: The height of the image in pixels. @return: The resize factor, C{1} for no resizing.

abstract recognize(pixels: Array, imageInfo: RecogImageInfo, onResult: Callable[[RecognitionResult | Exception], None])

Asynchronously recognize content from an image. This method should not block. Only one recognition can be performed at a time. @param pixels: The pixels of the image as a two dimensional array of RGBQUADs.

For example, to get the red value for the coordinate (1, 2): pixels[2][1].rgbRed This can be treated as raw bytes in BGRA8 format; i.e. four bytes per pixel in the order blue, green, red, alpha. However, the alpha channel should be ignored.

@type pixels: Two dimensional array (y then x) of L{winGDI.RGBQUAD} @param imageInfo: Information about the image for recognition. @param onResult: A callable which takes a L{RecognitionResult} (or an exception on failure)

as its only argument.

abstract cancel()

Cancel the recognition in progress (if any).

validateCaptureBounds(location: RectLTWH) bool

Validate the capture coordinates before creating image for content recognition

validateObject(nav: NVDAObject) bool

Validation to be performed on the navigator object before content recognition @param nav: The navigator object to be validated @return: C{True} or C{False}, depending on whether the navigator object is valid or not.

C{True} for no validation.

_abc_impl = <_abc._abc_data object>
class contentRecog.RecogImageInfo(screenLeft: int, screenTop: int, screenWidth: int, screenHeight: int, resizeFactor: int | float)

Bases: object

Encapsulates information about a recognized image and provides functionality to convert coordinates. An image captured for recognition can begin at any point on the screen. However, the image must be cropped when passed to the recognizer. Also, some recognizers need the image to be resized prior to recognition. This class calculates the width and height of the image for recognition; see the L{recogWidth} and L{recogHeight} attributes. It can also convert coordinates in the recognized image to screen coordinates suitable to be returned to NVDA; e.g. in order to route the mouse. This is done using the L{convertXToScreen} and L{convertYToScreen} methods.

@param screenLeft: The x screen coordinate of the upper-left corner of the image. @param screenTop: The y screen coordinate of the upper-left corner of the image. @param screenWidth: The width of the image on the screen. @param screenHeight: The height of the image on the screen. @param resizeFactor: The factor by which the image must be resized for recognition. @raise ValueError: If the supplied screen coordinates indicate that

the image is not visible; e.g. width or height of 0.

recogWidth

The width of the recognized image.

recogHeight

The height of the recognized image.

classmethod createFromRecognizer(screenLeft: int, screenTop: int, screenWidth: int, screenHeight: int, recognizer: ContentRecognizer)

Convenience method to construct an instance using a L{ContentRecognizer}. The resize factor is obtained by calling L{ContentRecognizer.getResizeFactor}.

convertXToScreen(x)

Convert an x coordinate in the recognized image to an x coordinate on the screen.

convertYToScreen(y)

Convert an y coordinate in the recognized image to a y coordinate on the screen.

convertWidthToScreen(width)

Convert width in the recognized image to the width on the screen.

convertHeightToScreen(height)

Convert height in the recognized image to the height on the screen.

class contentRecog.RecognitionResult

Bases: TrackedObject

Provides access to the result of recognition by a recognizer. The result is textual, but to facilitate navigation by word, line, etc. and to allow for retrieval of screen coordinates within the text, L{TextInfo} objects are used. Callers use the L{makeTextInfo} method to create a L{TextInfo}. Most implementers should use one of the subclasses provided in this module.

abstract makeTextInfo(obj, position) BaseContentRecogTextInfo

Make a TextInfo within the recognition result text at the requested position. @param obj: The object to return for the C{obj} property of the TextInfo.

The TextInfo itself doesn’t use this, but NVDA requires it to set the review object, etc.

@param position: The requested position; one of the C{textInfos.POSITION_*} constants. @return: The TextInfo at the requested position in the result.

_abc_impl = <_abc._abc_data object>
class contentRecog.LwrWord(offset, left, top, width, height)

Bases: tuple

Create new instance of LwrWord(offset, left, top, width, height)

_asdict()

Return a new dict which maps field names to their values.

_field_defaults = {}
_fields = ('offset', 'left', 'top', 'width', 'height')
classmethod _make(iterable)

Make a new LwrWord object from a sequence or iterable

_replace(**kwds)

Return a new LwrWord object replacing specified fields with new values

height

Alias for field number 4

left

Alias for field number 1

offset

Alias for field number 0

top

Alias for field number 2

width

Alias for field number 3

class contentRecog.LinesWordsResult(data: List[List[Dict[str, str | int]]], imageInfo: RecogImageInfo)

Bases: RecognitionResult

A L{RecognizerResult} which can create TextInfos based on a simple lines/words data structure. The data structure is a list of lines, wherein each line is a list of words, wherein each word is a dict containing the keys x, y, width, height and text. Several OCR engines produce output in a format which can be easily converted to this.

Constructor. @param data: The lines/words data structure. For example:

[
[

{“x”: 106, “y”: 91, “width”: 11, “height”: 9, “text”: “Word1”}, {“x”: 117, “y”: 91, “width”: 11, “height”: 9, “text”: “Word2”}

], [

{“x”: 106, “y”: 105, “width”: 11, “height”: 9, “text”: “Word3”}, {“x”: 117, “y”: 105, “width”: 11, “height”: 9, “text”: “Word4”}

]

]

@param imageInfo: Information about the recognized image.

This is used to convert coordinates in the recognized image to screen coordinates.

lines

End offsets for each line.

words

Start offsets and screen coordinates for each word.

_parseData()
makeTextInfo(obj, position)

Make a TextInfo within the recognition result text at the requested position. @param obj: The object to return for the C{obj} property of the TextInfo.

The TextInfo itself doesn’t use this, but NVDA requires it to set the review object, etc.

@param position: The requested position; one of the C{textInfos.POSITION_*} constants. @return: The TextInfo at the requested position in the result.

_abc_impl = <_abc._abc_data object>
class contentRecog.LwrTextInfo(*args, **kwargs)

Bases: BaseContentRecogTextInfo, OffsetsTextInfo

TextInfo used by L{LinesWordsResult}. This should only be instantiated by L{LinesWordsResult}.

Constructor. Subclasses must extend this, calling the superclass method first. @param position: The initial position of this range; one of the POSITION_* constants or a position object supported by the implementation. @param obj: The object containing the range of text being represented.

encoding: str | None = None

The encoding internal to the underlying text info implementation.

copy()

duplicates this text info object so that changes can be made to either one with out afecting the other

_getTextRange(start, end)

Retrieve the text in a given offset range. @param start: The start offset. @type start: int @param end: The end offset (exclusive). @type end: int @return: The text contained in the requested range. @rtype: str

_getStoryLength()
_getLineOffsets(offset)
_getWordOffsets(offset)
_getBoundingRectFromOffset(offset)
_abc_impl = <_abc._abc_data object>
_propertyCache: Set[GetterMethodT]
class contentRecog.SimpleTextResult(text)

Bases: RecognitionResult

A L{RecognitionResult} which presents a simple text string. This should only be used if the recognizer only returns text and no coordinate information. In this case, NVDA calculates words and lines itself based on the text; e.g. a new line character breaks a line. Routing the mouse, etc. cannot be supported because even though NVDA has the coordinates for the entire block of content, it doesn’t have the coordinates for individual words or characters.

makeTextInfo(obj, position)

Make a TextInfo within the recognition result text at the requested position. @param obj: The object to return for the C{obj} property of the TextInfo.

The TextInfo itself doesn’t use this, but NVDA requires it to set the review object, etc.

@param position: The requested position; one of the C{textInfos.POSITION_*} constants. @return: The TextInfo at the requested position in the result.

_abc_impl = <_abc._abc_data object>
class contentRecog.SimpleResultTextInfo(*args, **kwargs)

Bases: BaseContentRecogTextInfo, OffsetsTextInfo

TextInfo used by L{SimpleTextResult}. This should only be instantiated by L{SimpleTextResult}.

Constructor. Subclasses must extend this, calling the superclass method first. @param position: The initial position of this range; one of the POSITION_* constants or a position object supported by the implementation. @param obj: The object containing the range of text being represented.

encoding: str | None = None

The encoding internal to the underlying text info implementation.

copy()

duplicates this text info object so that changes can be made to either one with out afecting the other

_abc_impl = <_abc._abc_data object>
_getStoryText()

Retrieve the entire text of the object. @return: The entire text of the object. @rtype: str

_propertyCache: Set[GetterMethodT]
_getStoryLength()

Submodules

contentRecog.recogUi module

User interface for content recognition. This module provides functionality to capture an image from the screen for the current navigator object, pass it to a content recognizer for recognition and present the result to the user so they can read it with cursor keys, etc. NVDA scripts or GUI call the L{recognizeNavigatorObject} function with the recognizer they wish to use.

class contentRecog.recogUi.RecogResultNVDAObject(chooseBestAPI=True, **kwargs)

Bases: CursorManager, Window

Fake NVDAObject used to present a recognition result in a cursor manager. This allows the user to read the result with cursor keys, etc. Pressing enter will activate (e.g. click) the text at the cursor. Pressing escape dismisses the recognition result.

role: controlTypes.Role = 52

Type definition for auto prop ‘_get_role’

name: str = 'Result'

Type definition for auto prop ‘_get_name’

treeInterceptor: Optional[TreeInterceptor] = None

Type definition for auto prop ‘_get_treeInterceptor’

makeTextInfo(position)
setFocus()

Tries to force this object to take the focus.

_get_hasFocus() bool

Whether this object has focus. @rtype: bool

script_activatePosition(gesture)

Activates the text at the cursor if possible

script_exit(gesture)

Dismiss the recognition result

script_find(gesture)

find a text string from the current cursor position

script_findNext(gesture)

find the next occurrence of the previously entered text string from the current cursor’s position

script_findPrevious(gesture)

find the previous occurrence of the previously entered text string from the current cursor’s position

__gestures = {'kb:enter': 'activatePosition', 'kb:escape': 'exit', 'kb:space': 'activatePosition'}
_abc_impl = <_abc._abc_data object>
hasFocus: bool

Type definition for auto prop ‘_get_hasFocus’

class contentRecog.recogUi.RefreshableRecogResultNVDAObject(chooseBestAPI=True, **kwargs)

Bases: RecogResultNVDAObject, LiveText

NVDA Object that itself is responsible for fetching the recognizition result. It is also able to refresh the result at intervals whenthe recognizer supports it.

_recognize(onResult: Callable[[RecognitionResult | Exception], None])
_onFirstResult(result: RecognitionResult | Exception)
_scheduleRecognize()
_onResult(result: RecognitionResult | Exception)
event_gainFocus()

This code is executed if a gain focus event is received by this object.

event_loseFocus()
start()
_abc_impl = <_abc._abc_data object>
_propertyCache: Set[GetterMethodT]
contentRecog.recogUi._activeRecog = None

Keeps track of the recognition in progress, if any.

contentRecog.recogUi.recognizeNavigatorObject(recognizer: ContentRecognizer)

User interface function to recognize content in the navigator object. This should be called from a script or in response to a GUI action. @param recognizer: The content recognizer to use.

contentRecog.uwpOcr module

Recognition of text using the UWP OCR engine included in Windows 10 and later.

contentRecog.uwpOcr.getLanguages()

Return the available recognition languages. @return: A list of language codes suitable to be passed to L{UwpOcr}’s constructor.

These need to be normalized with L{languageHandler.normalizeLanguage} for use as NVDA language codes.

@rtype: list of str

contentRecog.uwpOcr.getInitialLanguage()

Get the language to use the first time UWP OCR is used. The NVDA interface language is used if a matching OCR language is available. Otherwise, this falls back to the first available language.

contentRecog.uwpOcr._getInitialLanguage(nvdaLang, ocrLangs)
contentRecog.uwpOcr.getConfigLanguage()

Get the user’s configured OCR language. If no language has been configured, choose an initial language and update the configuration.

class contentRecog.uwpOcr.UwpOcr(*args, **kwargs)

Bases: ContentRecognizer

@param language: The language code of the desired recognition language,

C{None} to use the user’s configured language.

classmethod _get_allowAutoRefresh() bool
classmethod _get_autoRefreshInterval() int
getResizeFactor(width, height)

Return the factor by which an image must be resized before it is passed to this recognizer. @param width: The width of the image in pixels. @param height: The height of the image in pixels. @return: The resize factor, C{1} for no resizing.

recognize(pixels, imgInfo, onResult)

Asynchronously recognize content from an image. This method should not block. Only one recognition can be performed at a time. @param pixels: The pixels of the image as a two dimensional array of RGBQUADs.

For example, to get the red value for the coordinate (1, 2): pixels[2][1].rgbRed This can be treated as raw bytes in BGRA8 format; i.e. four bytes per pixel in the order blue, green, red, alpha. However, the alpha channel should be ignored.

@type pixels: Two dimensional array (y then x) of L{winGDI.RGBQUAD} @param imageInfo: Information about the image for recognition. @param onResult: A callable which takes a L{RecognitionResult} (or an exception on failure)

as its only argument.

_abc_impl = <_abc._abc_data object>
cancel()

Cancel the recognition in progress (if any).

_propertyCache: Set[GetterMethodT]