Features

Please refer to the source code's docstrings for up-to-date description where the feature definition, technical details, and known differences from Writeprints-Static of Brennan, Afroz, and Greenstadt (2012) are included. A docstring example shows below.

"""avg_word_length

Counts the average number of characters for words in the text.

The length the concatenation of all words over "total words" is counted.

Known differences with Writeprints Static feature "average word length": None.

Args:
    word_tokens: List of lists of token.text in spaCy doc instances.

Returns:
    Average length of words in the document.
"""

Caveat

The writeprints-static package uses spaCy 2.x's default tokenizer under the hood. There are many tokenizers that define a word token differently, which will induce consequences on the calculation of other features based on the definition of word token. For instance, NLTK 3.5's default tokenizer, (word_tokenize), disagree with spaCy's in aspects.