Span

A slice from a Doc object.

Span.init method

Create a Span object from the slice doc[start : end].

Name	Type	Description
`doc`	`Doc`	The parent document.
`start`	int	The index of the first token of the span.
`end`	int	The index of the first token after the span.
`label`	int / unicode	A label to attach to the span, e.g. for named entities. As of v2.1, the label can also be a unicode string.
`kb_id`	int / unicode	A knowledge base ID to attach to the span, e.g. for named entities. The ID can be an integer or a unicode string.
`vector`	`numpy.ndarray[ndim=1, dtype='float32']`	A meaning representation of the span.
RETURNS	`Span`	The newly constructed object.

Span.getitem method

Get a Token object.

Name	Type	Description
`i`	int	The index of the token within the span.
RETURNS	`Token`	The token at `span[i]`.

Get a Span object.

Name	Type	Description
`start_end`	tuple	The slice of the span to get.
RETURNS	`Span`	The span at `span[start : end]`.

Span.iter method

Iterate over Token objects.

Name	Type	Description
YIELDS	`Token`	A `Token` object.

Span.len method

Get the number of tokens in the span.

Name	Type	Description
RETURNS	int	The number of tokens in the span.

Span.set_extension classmethodv2.0

Define a custom attribute on the Span which becomes available via Span._. For details, see the documentation on custom attributes.

Example

from spacy.tokens import Span
city_getter = lambda span: any(city in span.text for city in ("New York", "Paris", "Berlin"))
Span.set_extension("has_city", getter=city_getter)
doc = nlp("I like New York in Autumn")
assert doc[1:4]._.has_city

Name	Type	Description
`name`	unicode	Name of the attribute to set by the extension. For example, `'my_attr'` will be available as `span._.my_attr`.
`default`	-	Optional default value of the attribute if no getter or method is defined.
`method`	callable	Set a custom method on the object, for example `span._.compare(other_span)`.
`getter`	callable	Getter function that takes the object and returns an attribute value. Is called when the user accesses the `._` attribute.
`setter`	callable	Setter function that takes the `Span` and a value, and modifies the object. Is called when the user writes to the `Span._` attribute.
`force`	bool	Force overwriting existing attribute.

Span.get_extension classmethodv2.0

Look up a previously registered extension by name. Returns a 4-tuple (default, method, getter, setter) if the extension is registered. Raises a KeyError otherwise.

Name	Type	Description
`name`	unicode	Name of the extension.
RETURNS	tuple	A `(default, method, getter, setter)` tuple of the extension.

Span.has_extension classmethodv2.0

Check whether an extension has been registered on the Span class.

Name	Type	Description
`name`	unicode	Name of the extension to check.
RETURNS	bool	Whether the extension has been registered.

Span.remove_extension classmethodv2.0.12

Remove a previously registered extension.

Name	Type	Description
`name`	unicode	Name of the extension.
RETURNS	tuple	A `(default, method, getter, setter)` tuple of the removed extension.

Span.char_span methodv2.2.4

Create a Span object from the slice span.text[start:end]. Returns None if the character indices don’t map to a valid span.

Name	Type	Description
`start`	int	The index of the first character of the span.
`end`	int	The index of the last character after the span.
`label`	uint64 / unicode	A label to attach to the span, e.g. for named entities.
`kb_id`	uint64 / unicode	An ID from a knowledge base to capture the meaning of a named entity.
`vector`	`numpy.ndarray[ndim=1, dtype='float32']`	A meaning representation of the span.
RETURNS	`Span`	The newly constructed object or `None`.

Span.similarity methodNeeds model

Make a semantic similarity estimate. The default estimate is cosine similarity using an average of word vectors.

Example

doc = nlp("green apples and red oranges")
green_apples = doc[:2]
red_oranges = doc[3:]
apples_oranges = green_apples.similarity(red_oranges)
oranges_apples = red_oranges.similarity(green_apples)
assert apples_oranges == oranges_apples

Name	Type	Description
`other`	-	The object to compare with. By default, accepts `Doc`, `Span`, `Token` and `Lexeme` objects.
RETURNS	float	A scalar similarity score. Higher is more similar.

Calculates the lowest common ancestor matrix for a given Span. Returns LCA matrix containing the integer index of the ancestor, or -1 if no common ancestor is found, e.g. if span excludes a necessary ancestor.

Name	Type	Description
RETURNS	`numpy.ndarray[ndim=2, dtype='int32']`	The lowest common ancestor matrix of the `Span`.

Span.to_array methodv2.0

Given a list of M attribute IDs, export the tokens to a numpy ndarray of shape (N, M), where N is the length of the document. The values will be 32-bit integers.

Example

from spacy.attrs import LOWER, POS, ENT_TYPE, IS_ALPHA
doc = nlp("I like New York in Autumn.")
span = doc[2:3]
# All strings mapped to integers, for easy export to numpy
np_array = span.to_array([LOWER, POS, ENT_TYPE, IS_ALPHA])

Name	Type	Description
`attr_ids`	list	A list of attribute ID ints.
RETURNS	`numpy.ndarray[long, ndim=2]`	A feature matrix, with one row per word, and one column per attribute indicated in the input `attr_ids`.

Span.merge method

Retokenize the document, such that the span is merged into a single token.

Name	Type	Description
`**attributes`	-	Attributes to assign to the merged token. By default, attributes are inherited from the syntactic root token of the span.
RETURNS	`Token`	The newly merged token.

Span.ents propertyv2.0.13Needs model

The named entities in the span. Returns a tuple of named entity Span objects, if the entity recognizer has been applied.

Example

doc = nlp("Mr. Best flew to New York on Saturday morning.")
span = doc[0:6]
ents = list(span.ents)
assert ents[0].label == 346
assert ents[0].label_ == "PERSON"
assert ents[0].text == "Mr. Best"

Name	Type	Description
RETURNS	tuple	Entities in the span, one `Span` per entity.

Span.as_doc method

Create a new Doc object corresponding to the Span, with a copy of the data.

Name	Type	Description
`copy_user_data`	bool	Whether or not to copy the original doc’s user data.
RETURNS	`Doc`	A `Doc` object of the `Span`’s content.

Span.root propertyNeeds model

The token with the shortest path to the root of the sentence (or the root itself). If multiple tokens are equally high in the tree, the first token is taken.

Example

doc = nlp("I like New York in Autumn.")
i, like, new, york, in_, autumn, dot = range(len(doc))
assert doc[new].head.text == "York"
assert doc[york].head.text == "like"
new_york = doc[new:york+1]
assert new_york.root.text == "York"

Name	Type	Description
RETURNS	`Token`	The root token.

Span.conjuncts propertyNeeds model

A tuple of tokens coordinated to span.root.

Name	Type	Description
RETURNS	`tuple`	The coordinated tokens.

Span.lefts propertyNeeds model

Tokens that are to the left of the span, whose heads are within the span.

Name	Type	Description
YIELDS	`Token`	A left-child of a token of the span.

Span.rights propertyNeeds model

Tokens that are to the right of the span, whose heads are within the span.

Name	Type	Description
YIELDS	`Token`	A right-child of a token of the span.

Span.n_lefts propertyNeeds model

The number of tokens that are to the left of the span, whose heads are within the span.

Name	Type	Description
RETURNS	int	The number of left-child tokens.

Span.n_rights propertyNeeds model

The number of tokens that are to the right of the span, whose heads are within the span.

Name	Type	Description
RETURNS	int	The number of right-child tokens.

Span.subtree propertyNeeds model

Tokens within the span and tokens which descend from them.

Name	Type	Description
YIELDS	`Token`	A token within the span, or a descendant from it.

Span.has_vector propertyNeeds model

A boolean value indicating whether a word vector is associated with the object.

Name	Type	Description
RETURNS	bool	Whether the span has a vector data attached.

Span.vector propertyNeeds model

A real-valued meaning representation. Defaults to an average of the token vectors.

Name	Type	Description
RETURNS	`numpy.ndarray[ndim=1, dtype='float32']`	A 1D numpy array representing the span’s semantics.

Span.vector_norm propertyNeeds model

The L2 norm of the span’s vector representation.

Name	Type	Description
RETURNS	float	The L2 norm of the vector representation.

Attributes

Name	Type	Description
`doc`	`Doc`	The parent document.
`tensor` v2.1.7	`ndarray`	The span’s slice of the parent `Doc`’s tensor.
`sent`	`Span`	The sentence span that this span is a part of.
`start`	int	The token offset for the start of the span.
`end`	int	The token offset for the end of the span.
`start_char`	int	The character offset for the start of the span.
`end_char`	int	The character offset for the end of the span.
`text`	unicode	A unicode representation of the span text.
`text_with_ws`	unicode	The text content of the span with a trailing whitespace character if the last token has one.
`orth`	int	ID of the verbatim text content.
`orth_`	unicode	Verbatim text content (identical to `Span.text`). Exists mostly for consistency with the other attributes.
`label`	int	The hash value of the span’s label.
`label_`	unicode	The span’s label.
`lemma_`	unicode	The span’s lemma.
`kb_id`	int	The hash value of the knowledge base ID referred to by the span.
`kb_id_`	unicode	The knowledge base ID referred to by the span.
`ent_id`	int	The hash value of the named entity the token is an instance of.
`ent_id_`	unicode	The string ID of the named entity the token is an instance of.
`sentiment`	float	A scalar value indicating the positivity or negativity of the span.
`_`	`Underscore`	User space for adding custom attribute extensions.

Containers

Span.init method

Span.getitem method

Span.iter method

Span.len method

Span.set_extension classmethodv2.0

Span.get_extension classmethodv2.0

Span.has_extension classmethodv2.0

Span.remove_extension classmethodv2.0.12

Span.char_span methodv2.2.4

Span.similarity methodNeeds model

Span.get_lca_matrix method

Span.to_array methodv2.0

Span.merge method

Span.ents propertyv2.0.13Needs model

Span.as_doc method

Span.root propertyNeeds model

Span.conjuncts propertyNeeds model

Span.lefts propertyNeeds model

Span.rights propertyNeeds model

Span.n_lefts propertyNeeds model

Span.n_rights propertyNeeds model

Span.subtree propertyNeeds model

Span.has_vector propertyNeeds model

Span.vector propertyNeeds model

Span.vector_norm propertyNeeds model

Attributes

Containers

Span.__init__ method

Span.__getitem__ method

Span.__iter__ method

Span.__len__ method

Span.set_extension classmethodv2.0

Span.get_extension classmethodv2.0

Span.has_extension classmethodv2.0

Span.remove_extension classmethodv2.0.12

Span.char_span methodv2.2.4

Span.similarity methodNeeds model

Span.get_lca_matrix method

Span.to_array methodv2.0

Span.merge method

Span.ents propertyv2.0.13Needs model

Span.as_doc method

Span.root propertyNeeds model

Span.conjuncts propertyNeeds model

Span.lefts propertyNeeds model

Span.rights propertyNeeds model

Span.n_lefts propertyNeeds model

Span.n_rights propertyNeeds model

Span.subtree propertyNeeds model

Span.has_vector propertyNeeds model

Span.vector propertyNeeds model

Span.vector_norm propertyNeeds model

Attributes

Span.init method

Span.getitem method

Span.iter method

Span.len method