Span
classA slice from a Doc object.
Span.__init__ method
Create a Span object from the slice doc[start : end].
| Name | Type | Description |
|---|---|---|
doc | Doc | The parent document. |
start | int | The index of the first token of the span. |
end | int | The index of the first token after the span. |
label | int / unicode | A label to attach to the span, e.g. for named entities. As of v2.1, the label can also be a unicode string. |
kb_id | int / unicode | A knowledge base ID to attach to the span, e.g. for named entities. The ID can be an integer or a unicode string. |
vector | numpy.ndarray[ndim=1, dtype='float32'] | A meaning representation of the span. |
| RETURNS | Span | The newly constructed object. |
Span.__getitem__ method
Get a Token object.
| Name | Type | Description |
|---|---|---|
i | int | The index of the token within the span. |
| RETURNS | Token | The token at span[i]. |
Get a Span object.
| Name | Type | Description |
|---|---|---|
start_end | tuple | The slice of the span to get. |
| RETURNS | Span | The span at span[start : end]. |
Span.__iter__ method
Iterate over Token objects.
| Name | Type | Description |
|---|---|---|
| YIELDS | Token | A Token object. |
Span.__len__ method
Get the number of tokens in the span.
| Name | Type | Description |
|---|---|---|
| RETURNS | int | The number of tokens in the span. |
Span.set_extension classmethodv2.0
Define a custom attribute on the Span which becomes available via Span._.
For details, see the documentation on
custom attributes.
| Name | Type | Description |
|---|---|---|
name | unicode | Name of the attribute to set by the extension. For example, 'my_attr' will be available as span._.my_attr. |
default | - | Optional default value of the attribute if no getter or method is defined. |
method | callable | Set a custom method on the object, for example span._.compare(other_span). |
getter | callable | Getter function that takes the object and returns an attribute value. Is called when the user accesses the ._ attribute. |
setter | callable | Setter function that takes the Span and a value, and modifies the object. Is called when the user writes to the Span._ attribute. |
force | bool | Force overwriting existing attribute. |
Span.get_extension classmethodv2.0
Look up a previously registered extension by name. Returns a 4-tuple
(default, method, getter, setter) if the extension is registered. Raises a
KeyError otherwise.
| Name | Type | Description |
|---|---|---|
name | unicode | Name of the extension. |
| RETURNS | tuple | A (default, method, getter, setter) tuple of the extension. |
Span.has_extension classmethodv2.0
Check whether an extension has been registered on the Span class.
| Name | Type | Description |
|---|---|---|
name | unicode | Name of the extension to check. |
| RETURNS | bool | Whether the extension has been registered. |
Span.remove_extension classmethodv2.0.12
Remove a previously registered extension.
| Name | Type | Description |
|---|---|---|
name | unicode | Name of the extension. |
| RETURNS | tuple | A (default, method, getter, setter) tuple of the removed extension. |
Span.char_span methodv2.2.4
Create a Span object from the slice span.text[start:end]. Returns None if
the character indices don’t map to a valid span.
| Name | Type | Description |
|---|---|---|
start | int | The index of the first character of the span. |
end | int | The index of the last character after the span. |
label | uint64 / unicode | A label to attach to the span, e.g. for named entities. |
kb_id | uint64 / unicode | An ID from a knowledge base to capture the meaning of a named entity. |
vector | numpy.ndarray[ndim=1, dtype='float32'] | A meaning representation of the span. |
| RETURNS | Span | The newly constructed object or None. |
Span.similarity methodNeeds model
Make a semantic similarity estimate. The default estimate is cosine similarity using an average of word vectors.
| Name | Type | Description |
|---|---|---|
other | - | The object to compare with. By default, accepts Doc, Span, Token and Lexeme objects. |
| RETURNS | float | A scalar similarity score. Higher is more similar. |
Span.get_lca_matrix method
Calculates the lowest common ancestor matrix for a given Span. Returns LCA
matrix containing the integer index of the ancestor, or -1 if no common
ancestor is found, e.g. if span excludes a necessary ancestor.
| Name | Type | Description |
|---|---|---|
| RETURNS | numpy.ndarray[ndim=2, dtype='int32'] | The lowest common ancestor matrix of the Span. |
Span.to_array methodv2.0
Given a list of M attribute IDs, export the tokens to a numpy ndarray of
shape (N, M), where N is the length of the document. The values will be
32-bit integers.
| Name | Type | Description |
|---|---|---|
attr_ids | list | A list of attribute ID ints. |
| RETURNS | numpy.ndarray[long, ndim=2] | A feature matrix, with one row per word, and one column per attribute indicated in the input attr_ids. |
Span.merge method
Retokenize the document, such that the span is merged into a single token.
| Name | Type | Description |
|---|---|---|
**attributes | - | Attributes to assign to the merged token. By default, attributes are inherited from the syntactic root token of the span. |
| RETURNS | Token | The newly merged token. |
Span.ents propertyv2.0.13Needs model
The named entities in the span. Returns a tuple of named entity Span objects,
if the entity recognizer has been applied.
| Name | Type | Description |
|---|---|---|
| RETURNS | tuple | Entities in the span, one Span per entity. |
Span.as_doc method
Create a new Doc object corresponding to the Span, with a copy of the data.
| Name | Type | Description |
|---|---|---|
copy_user_data | bool | Whether or not to copy the original doc’s user data. |
| RETURNS | Doc | A Doc object of the Span’s content. |
Span.root propertyNeeds model
The token with the shortest path to the root of the sentence (or the root itself). If multiple tokens are equally high in the tree, the first token is taken.
| Name | Type | Description |
|---|---|---|
| RETURNS | Token | The root token. |
Span.conjuncts propertyNeeds model
A tuple of tokens coordinated to span.root.
| Name | Type | Description |
|---|---|---|
| RETURNS | tuple | The coordinated tokens. |
Span.lefts propertyNeeds model
Tokens that are to the left of the span, whose heads are within the span.
| Name | Type | Description |
|---|---|---|
| YIELDS | Token | A left-child of a token of the span. |
Span.rights propertyNeeds model
Tokens that are to the right of the span, whose heads are within the span.
| Name | Type | Description |
|---|---|---|
| YIELDS | Token | A right-child of a token of the span. |
Span.n_lefts propertyNeeds model
The number of tokens that are to the left of the span, whose heads are within the span.
| Name | Type | Description |
|---|---|---|
| RETURNS | int | The number of left-child tokens. |
Span.n_rights propertyNeeds model
The number of tokens that are to the right of the span, whose heads are within the span.
| Name | Type | Description |
|---|---|---|
| RETURNS | int | The number of right-child tokens. |
Span.subtree propertyNeeds model
Tokens within the span and tokens which descend from them.
| Name | Type | Description |
|---|---|---|
| YIELDS | Token | A token within the span, or a descendant from it. |
Span.has_vector propertyNeeds model
A boolean value indicating whether a word vector is associated with the object.
| Name | Type | Description |
|---|---|---|
| RETURNS | bool | Whether the span has a vector data attached. |
Span.vector propertyNeeds model
A real-valued meaning representation. Defaults to an average of the token vectors.
| Name | Type | Description |
|---|---|---|
| RETURNS | numpy.ndarray[ndim=1, dtype='float32'] | A 1D numpy array representing the span’s semantics. |
Span.vector_norm propertyNeeds model
The L2 norm of the span’s vector representation.
| Name | Type | Description |
|---|---|---|
| RETURNS | float | The L2 norm of the vector representation. |
Attributes
| Name | Type | Description |
|---|---|---|
doc | Doc | The parent document. |
tensor v2.1.7 | ndarray | The span’s slice of the parent Doc’s tensor. |
sent | Span | The sentence span that this span is a part of. |
start | int | The token offset for the start of the span. |
end | int | The token offset for the end of the span. |
start_char | int | The character offset for the start of the span. |
end_char | int | The character offset for the end of the span. |
text | unicode | A unicode representation of the span text. |
text_with_ws | unicode | The text content of the span with a trailing whitespace character if the last token has one. |
orth | int | ID of the verbatim text content. |
orth_ | unicode | Verbatim text content (identical to Span.text). Exists mostly for consistency with the other attributes. |
label | int | The hash value of the span’s label. |
label_ | unicode | The span’s label. |
lemma_ | unicode | The span’s lemma. |
kb_id | int | The hash value of the knowledge base ID referred to by the span. |
kb_id_ | unicode | The knowledge base ID referred to by the span. |
ent_id | int | The hash value of the named entity the token is an instance of. |
ent_id_ | unicode | The string ID of the named entity the token is an instance of. |
sentiment | float | A scalar value indicating the positivity or negativity of the span. |
_ | Underscore | User space for adding custom attribute extensions. |