Python API¶
botcity.document_processing.parser.document.DocumentParser
¶
add_entry(self, entry)
¶
Add an entry into the parser list.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entry |
Entry |
The entry to be added. |
required |
clear(self)
¶
Clear the list of entries.
combined_entries(self, *args)
¶
Combine a list of entries into a new merged entry.
Returns:
Type | Description |
---|---|
Entry |
The new merged entry. |
get_entries(self)
¶
The parser entries.
Returns:
Type | Description |
---|---|
List[Entry] |
The parser entries. |
get_first_entry(self, text='', entry=0)
¶
Get the first entry which meets the text criteria.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
Optional[str] |
The entry text. Defaults to "". |
'' |
entry |
Optional[Union[int, Entry]] |
Reference Entry or index to use as start point for the search. Defaults to 0. |
0 |
Returns:
Type | Description |
---|---|
Entry |
The corresponding entry. |
get_first_entry_contains(self, text='', entry=0)
¶
Get the first entry which contains the text criteria.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
Optional[str] |
The entry partial text. Defaults to "". |
'' |
entry |
Optional[Union[int, Entry]] |
Reference Entry or index to use as start point for the search. Defaults to 0. |
0 |
Returns:
Type | Description |
---|---|
Entry |
The corresponding entry. |
get_full_text(self)
¶
Returns the full document text.
Returns:
Type | Description |
---|---|
str |
The document text. |
get_last_entry(self)
¶
Get the last entry on the parser's entry list.
Returns:
Type | Description |
---|---|
Entry |
The last entry. |
get_n_entry(self, text='', entry=0, count=1)
¶
Get the nth entry corresponding to the parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
Optional[str] |
The entry text. Defaults to "". |
'' |
entry |
Optional[Union[int, Entry]] |
Reference Entry or index to use as start point for the search. Defaults to 0. |
0 |
count |
Optional[int] |
Index of search to return. 1 means first entry, 2 means second entry, etc. Defaults to 1. |
1 |
Returns:
Type | Description |
---|---|
Entry |
The corresponding entry. |
get_second_entry(self, text='', entry=0)
¶
get the second entry which meets the text criteria.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
Optional[str] |
The entry text. Defaults to "". |
'' |
entry |
Optional[Union[int, Entry]] |
Reference Entry or index to use as start point for the search. Defaults to 0. |
0 |
Returns:
Type | Description |
---|---|
Entry |
The corresponding entry. |
load_entries(self, entries, sort=True)
¶
Load entries into the parser.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entries |
List |
List of Entry objects or List of List containing the required information. |
required |
sort |
bool |
Sort the entries. Defaults to True. |
True |
print(self)
¶
Print the list of entries.
read(self, entry, margin_left, margin_right, margin_top, margin_bottom, line_height=None, data_type=None, right_reference=None, bottom_reference=None)
¶
Read an area and return its content.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entry |
Entry |
The anchor entry. |
required |
margin_left |
float |
Proportion from the anchor's left corner. |
required |
margin_right |
float |
Proportion from the anchor's right corner. |
required |
margin_top |
float |
Proportion from the anchor's top. |
required |
margin_bottom |
float |
Proportion from the anchor's bottom. |
required |
line_height |
Optional[int] |
Line height for compensation. Defaults to None. |
None |
data_type |
[type] |
Expected data type for use with OCR to correct for possible reading artifacts. Defaults to None. |
None |
right_reference |
Optional[Entry] |
Reference Entry to use as right anchor. Defaults to None. |
None |
bottom_reference |
Optional[Entry] |
Reference Entry to use as bottom anchor. Defaults to None. |
None |
Returns:
Type | Description |
---|---|
str |
The text content from the area. |
set_entries(self, entries, sort=True)
¶
Sets the list of entries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entries |
List[Entry] |
List of entries. |
required |
sort |
bool |
Sort the entries. Defaults to True. |
True |
botcity.document_processing.pdf.pdfreader.PDFReader
¶
page_height: float
property
readonly
¶
PDF Page height.
page_width: float
property
readonly
¶
PDF Page width.
read_file(self, file)
¶
Read the given PDF file and returns a new instance of the DocumentParser.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file |
str |
PDF file path. |
required |
Returns:
Type | Description |
---|---|
DocumentParser |
The document parser to be used. |