pdftron::PDF::TextExtractor::Line Class Reference

TextExtractor::Line object represents a line of text on a PDF page. More...

#include <TextExtractor.h>

List of all members.

Public Member Functions

int GetNumWords ()
bool IsSimpleLine ()
const double * GetBBox ()
void GetQuad (double out_quad[8])
Word GetFirstWord ()
Word GetWord (int word_idx)
Line GetNextLine ()
int GetCurrentNum ()
Style GetStyle ()
int GetParagraphID ()
int GetFlowID ()
bool EndsWithHyphen ()
bool IsValid ()
bool operator== (const Line &)
bool operator!= (const Line &)
 Line ()


Detailed Description

TextExtractor::Line object represents a line of text on a PDF page.

Each line consists of a sequence of words, and each words in one or more styles.


Constructor & Destructor Documentation

pdftron::PDF::TextExtractor::Line::Line (  ) 


Member Function Documentation

int pdftron::PDF::TextExtractor::Line::GetNumWords (  ) 

Returns:
The number of words in this line.

bool pdftron::PDF::TextExtractor::Line::IsSimpleLine (  ) 

Returns:
true is this line is not rotated (i.e. if the quadrilaterals returned by GetBBox() and GetQuad() coincide).

const double* pdftron::PDF::TextExtractor::Line::GetBBox (  ) 

Parameters:
out_bbox The bounding box for this line (in unrotated page coordinates).
Note:
To account for the effect of page '/Rotate' attribute, transform all points using page.GetDefaultMatrix().

void pdftron::PDF::TextExtractor::Line::GetQuad ( double  out_quad[8]  ) 

Parameters:
out_quad The quadrilateral representing a tight bounding box for this line (in unrotated page coordinates).

Word pdftron::PDF::TextExtractor::Line::GetFirstWord (  ) 

Returns:
the first word in the line.
Note:
To traverse the list of all words on this line use word.GetNextWord().

Word pdftron::PDF::TextExtractor::Line::GetWord ( int  word_idx  ) 

Returns:
the i-th word in this line.

Line pdftron::PDF::TextExtractor::Line::GetNextLine (  ) 

Returns:
the next line on the page.

int pdftron::PDF::TextExtractor::Line::GetCurrentNum (  ) 

Returns:
the index of this line of the current page.

Style pdftron::PDF::TextExtractor::Line::GetStyle (  ) 

Returns:
predominant style for this line.

int pdftron::PDF::TextExtractor::Line::GetParagraphID (  ) 

Returns:
The unique identifier for a paragraph or column that this line belongs to. This information can be used to identify which lines belong to which paragraphs.

int pdftron::PDF::TextExtractor::Line::GetFlowID (  ) 

Returns:
The unique identifier for a paragraph or column that this line belongs to. This information can be used to identify which lines/paragraphs belong to which flows.

bool pdftron::PDF::TextExtractor::Line::EndsWithHyphen (  ) 

Returns:
true is this line of text ends with a hyphen (i.e. '-'), false otherwise.

bool pdftron::PDF::TextExtractor::Line::IsValid (  ) 

Returns:
true if this is a valid line, false otherwise.

bool pdftron::PDF::TextExtractor::Line::operator== ( const Line  ) 

bool pdftron::PDF::TextExtractor::Line::operator!= ( const Line  ) 


© 2002-2010 PDFTron Systems Inc.