Simple Shift logo

Simple Shift  
Language Engineering
     

 

Rising Sun

 

 

 

Home            Products            Team            Partners            Research            Contact

 
 

myClass

About Automated Classification

Automated Classification is a process by which a computer quickly classifies a large volume of documents within an existing classification, which can included thousands of categories and several depth levels.

This type of application is needed in several contexts, the most frequent of which is archiving: if hundreds of thousands, or perhaps millions of paper documents are being dematerialized (scanned) in order to free office space and facilitate their re-use, they must be stored in an adequate fashion, i.e according to their topics or to any other classification criterium (otherwise any keyword search will retrieve thousands of results and the most relevant documents will be difficult, if not impossible, to find). The problem is, classifying manually this enormous volume of documents is not realistic: it is too costly and too much time-consuming.

An efficient solution to this problem is to train an artificial intelligence system so that is learns the customer's specific classification and identifies the most typical documents for each category. Then the classification accuracy (precision) of the system is tested and if it is high enough, the system is required to classify automatically all the documents.

Other examples of automated classification needs include the management of incoming flows of electronic documents (e.g. in financial systems), the classification of patent applications, etc.

myClass: Simple Shift's automated classification tool

 myClass

myClass is an automatic classifier which uses Neural Network technologies. It has to be trained on the customer's classification: typical examples (documents) of the categories must be provided manually so it can learn to identify the terms and expressions which characterize best each category. This phase may take some time to be performed, but then the system is able to classify a document in a matter of milliseconds across several thousand categories, and in several languages if it was trained accordingly.

In the screenshot above, myClass was used to predict to which category would belong a document containing the German expression "Kommission 0.375 % p.a. auf das Durchschnittskapital". The four possible categories, which are real categories of financial statements produced by a Swiss bank, were the following: "Hypotheken",  "Konto_Reporting", "Wertschriften" and "Zahlungsverkehr". myClass correctly indicated that the relevant category was "Wertschriften", and also indicated its second and third best predictions.

myClass is also used, for example, by a major international organization to help human examiners categorize patents in the International Patent Classification (IPC). This application is called IPCCAT and may be accessed here.

The IPCCAT example is based on a human-machine interaction, but myClass may also be used as an internal module in a larger application, often in the context of document management and archiving solutions. It can be integrated in OEM mode, for example in content management (CMS) solutions.