If using a CDN is not an option for you, then you want to have a local copy of the script in your own server. After including this simple Script, you will be ready to use tesseract so follow the step 2. It will automatically as well load the trained data for the language that you need from the CDN as well (thing that you need to do by yourself if you want to host a local copy). Using the free CDN, you can only include the tesseract script in your document that will automatically include the worker in the background:
As expected, to achieve an acceptable performance in the browser, the script uses a web worker that is located in another file ( tesseract-worker.js), which means that you only need to include tesseract.js and the worker needs to be in the same directory as the script will include the worker automatically for you. Tesseract.js works in the following way, you will need 2 scripts, namely tesseract.js and its tesseract-worker.js. Installing Tesseract.jsĪs mentioned, you can use Tesseract.js library from the browser using either a CDN or from a local copy (for more information about this library, please visit the official repository at Github here). In this article, we'll show how to use Tesseract.js in the browser to convert an image to text (extract text from an image). Tesseract.js can run either in a browser and on a server with NodeJS which makes it available on a lot of platforms. This library supports over 60 languages, automatic text orientation and script detection, a simple interface for reading paragraph, word, and character bounding boxes. Tesseract.js is a pure Javascript port of the popular Tesseract OCR engine. For JavaScript, there's a popular solution based on the Tesseract OCR engine, we are talking about the Tesseract.js project. Nowadays, the Optical Character Recognition is the preferred way to digitize documents, instead of entering the metadata of the documents manually, because the OCR will identify the text in the documents which are fed into the document management system and allows you to do something with the plain text, without even reading it by yourself.