myClass
About
Automated
Classification
Automated Classification is a
process by which a
computer quickly classifies a large volume of documents within an
existing classification, which can included
thousands of categories and several depth levels.
This type of application is needed
in several
contexts, the most frequent of which is archiving: if hundreds of
thousands, or perhaps millions of paper documents
are being dematerialized (scanned) in order to free office space and
facilitate their re-use, they must be stored
in an adequate fashion, i.e according to their topics
or to any other classification criterium (otherwise
any keyword search will retrieve thousands of results and the most
relevant documents will be difficult, if not
impossible, to find). The problem is, classifying manually this
enormous volume of documents is not realistic: it
is too costly and too much time-consuming.
An efficient solution to this problem is
to train an artificial intelligence system so that is learns
the customer's specific classification and identifies
the most typical documents for each category. Then
the classification accuracy (precision) of the system
is tested and if it is high enough, the system is
required to classify automatically all the documents.
Other examples of automated classification needs
include the management of incoming flows of electronic
documents (e.g. in financial systems), the classification of
patent applications, etc.
myClass:
Simple Shift's automated classification tool
myClass is an automatic
classifier which uses Neural Network technologies. It
has to be
trained on the customer's classification: typical examples (documents)
of the categories must be provided
manually so it can learn to identify the terms and expressions which
characterize best each category. This phase
may take some time to be performed, but then the system is able to
classify a document in a matter of milliseconds
across several thousand categories, and in several languages if it was
trained accordingly.
In the screenshot above,
myClass was used to predict to which category would belong a document
containing the German expression "Kommission 0.375 % p.a. auf das
Durchschnittskapital". The four possible
categories, which are real categories of financial statements produced
by a Swiss bank, were the following:
"Hypotheken", "Konto_Reporting", "Wertschriften" and
"Zahlungsverkehr". myClass correctly indicated that the
relevant category was "Wertschriften", and also indicated its second
and third best predictions.
myClass is also
used, for example, by a major international organization
to help
human examiners categorize patents in the International Patent
Classification (IPC). This application is
called IPCCAT and may be accessed here.
The IPCCAT example is based on
a human-machine interaction, but myClass may also be used as an
internal module in a larger application, often in the context of
document management and archiving solutions. It
can be integrated in OEM mode, for example in content
management (CMS) solutions.
|