Skip to content

Python API

botcity.document_processing.parser.document.DocumentParser

add_entry(self, entry)

Add an entry into the parser list.

Parameters:

Name Type Description Default
entry Entry

The entry to be added.

required

clear(self)

Clear the list of entries.

combined_entries(self, *args)

Combine a list of entries into a new merged entry.

Returns:

Type Description
Entry

The new merged entry.

get_entries(self)

The parser entries.

Returns:

Type Description
List[Entry]

The parser entries.

get_first_entry(self, text='', entry=0)

Get the first entry which meets the text criteria.

Parameters:

Name Type Description Default
text Optional[str]

The entry text. Defaults to "".

''
entry Optional[Union[int, Entry]]

Reference Entry or index to use as start point for the search. Defaults to 0.

0

Returns:

Type Description
Entry

The corresponding entry.

get_first_entry_contains(self, text='', entry=0)

Get the first entry which contains the text criteria.

Parameters:

Name Type Description Default
text Optional[str]

The entry partial text. Defaults to "".

''
entry Optional[Union[int, Entry]]

Reference Entry or index to use as start point for the search. Defaults to 0.

0

Returns:

Type Description
Entry

The corresponding entry.

get_full_text(self)

Returns the full document text.

Returns:

Type Description
str

The document text.

get_last_entry(self)

Get the last entry on the parser's entry list.

Returns:

Type Description
Entry

The last entry.

get_n_entry(self, text='', entry=0, count=1)

Get the nth entry corresponding to the parameters.

Parameters:

Name Type Description Default
text Optional[str]

The entry text. Defaults to "".

''
entry Optional[Union[int, Entry]]

Reference Entry or index to use as start point for the search. Defaults to 0.

0
count Optional[int]

Index of search to return. 1 means first entry, 2 means second entry, etc. Defaults to 1.

1

Returns:

Type Description
Entry

The corresponding entry.

get_second_entry(self, text='', entry=0)

get the second entry which meets the text criteria.

Parameters:

Name Type Description Default
text Optional[str]

The entry text. Defaults to "".

''
entry Optional[Union[int, Entry]]

Reference Entry or index to use as start point for the search. Defaults to 0.

0

Returns:

Type Description
Entry

The corresponding entry.

load_entries(self, entries, sort=True)

Load entries into the parser.

Parameters:

Name Type Description Default
entries List

List of Entry objects or List of List containing the required information.

required
sort bool

Sort the entries. Defaults to True.

True

print(self)

Print the list of entries.

read(self, entry, margin_left, margin_right, margin_top, margin_bottom, line_height=None, data_type=None, right_reference=None, bottom_reference=None)

Read an area and return its content.

Parameters:

Name Type Description Default
entry Entry

The anchor entry.

required
margin_left float

Proportion from the anchor's left corner.

required
margin_right float

Proportion from the anchor's right corner.

required
margin_top float

Proportion from the anchor's top.

required
margin_bottom float

Proportion from the anchor's bottom.

required
line_height Optional[int]

Line height for compensation. Defaults to None.

None
data_type [type]

Expected data type for use with OCR to correct for possible reading artifacts. Defaults to None.

None
right_reference Optional[Entry]

Reference Entry to use as right anchor. Defaults to None.

None
bottom_reference Optional[Entry]

Reference Entry to use as bottom anchor. Defaults to None.

None

Returns:

Type Description
str

The text content from the area.

set_entries(self, entries, sort=True)

Sets the list of entries.

Parameters:

Name Type Description Default
entries List[Entry]

List of entries.

required
sort bool

Sort the entries. Defaults to True.

True

botcity.document_processing.pdf.pdfreader.PDFReader

page_height: float property readonly

PDF Page height.

page_width: float property readonly

PDF Page width.

read_file(self, file)

Read the given PDF file and returns a new instance of the DocumentParser.

Parameters:

Name Type Description Default
file str

PDF file path.

required

Returns:

Type Description
DocumentParser

The document parser to be used.