Skip to content

Processing text-type documents

With the project properly loaded in BotCity Studio, we can load a new document to start building the reading template.

If you are using a PDF document that contains selectable text when loading the document in BotCity Studio, you must leave the Provider: Text option selected.

load-project

Tip

We suggest that you keep the option Generate code for imports and load file? checked when loading the PDF document; this way, a code snippet will be generated in your source code, having everything necessary to import, instantiate, and configure the parser who will be responsible for processing the document.

# Remember to install and add to your requirements.txt the following packages:
# botcity-documents

# Import the packages
from botcity.document_processing import PDFReader

# Read the file and instantiate the reader
parser = PDFReader().read_file(r"pdf_file_path")
/* Add the following into the imports section of your code.
import java.io.File;
// Import the PDFReader, DocumentParser and Entry objects
import dev.botcity.botcity_document_processing.parser.DocumentParser;
import dev.botcity.botcity_document_processing.parser.Entry;
import dev.botcity.botcity_document_processing.pdf.PdfReader;
*/

// Read the file and instantiate the parser
DocumentParser parser = null;
try {
    parser = new PdfReader().readFile(new File("pdf_file_path"));
} catch (Exception e) {
    e.printStackTrace();
}

String value = "";

Creating templates

Now, creating a reading template for a document is very simple.

Just click and drag the mouse to select the field you want to read (outlined in red). Then, select the read area related to that field (outlined in blue) as shown in the image below:

docs-gif

This process is repeated for each field in the document you need to read and your custom parser is built in quickly. After selecting all the fields to be read, you will see that code was generated automatically.

code-automatic

Complete code

from botcity.document_processing import *

def parse_file(filename):

    # Instantiate Reader, read the file and get the parser
    parser = PDFReader().read_file(filename)

    # Account No
    _account_no = parser.get_first_entry("Account No:")
    value = parser.read(_account_no, 1.04386, -2.5, 1.754386, 4)
    print(f'Account no: {value}')

    # Statement Date
    _statement_date = parser.get_first_entry("Statement Date:")
    value = parser.read(_statement_date, 1.032, -2, 1.24, 3.6)
    print(f'Statement date: {value}')

    # Due Date
    _due_date = parser.get_first_entry("Due Date:")
    value = parser.read(_due_date, 1.025316, -1.6, 2.025316, 3)
    print(f'Due Date: {value}')

    # Total Amount Due
    _total_amount_due = parser.get_first_entry("Total Amount Due:")
    value = parser.read(_total_amount_due, -0.032967, 4, 1.032967, 4.666667)
    print(f'Total amount due: {value}')

parse_file(filename="statement.pdf")
import dev.botcity.botcity_document_processing.parser.*;
import dev.botcity.botcity_document_processing.pdf.*;

public static void main( String[] args ){   

    // Instantiate Reader
    PdfReader reader = new PdfReader();

    // Read the file and get the parser
    DocumentParser parser = reader.readFile(new File("./statement.pdf"));

    Entry _account_no = parser.getFirstEntry("Account No:");
    String account = parser.read(_account_no, 1.342105, -2.625, 1.256579, 4.5);
    System.out.println("Account no:" + account);

    Entry _statement_date = parser.getFirstEntry("Statement Date:");
    String statementDate = parser.read(_statement_date, 1.265, -1.5, 0.955, 2.875);
    System.out.println("Statement date:" + statementDate);

    Entry _due_date = parser.getFirstEntry("Due Date:");
    String dueDate = parser.read(_due_date, 1.420635, -1.875, 1.531746, 3.625);
    System.out.println("Due Date:" + dueDate);

    Entry _total = parser.getFirstEntry("Total");
    String total = parser.read(_total, 7.953125, -3, 3.0625, 5.75);
    System.out.println("Total amount due:" + total);
}

The result of the analysis and reading of the document

When running this template above and printing the returned values, we will have the following result:

Account no: 1023456789-0
Statement date: 03/08/2016
Due Date: 03/29/2016
Total amount due: $115.28