History

2.2.2 (2021-11-09)

  • kitty now takes in input a stopword list instead of a language (from which it gathered the stopwords)

  • solving a bug in the whitespace preprocessing function

  • adding a new preprocessing function that supports passing the stopwords as a list

  • deprecating whitespace preprocessing

  • minor fixes to kitty API

  • breaking change to kitty API, now uses WhiteSpacePreprocessingStopwords.

2.2.0 (2021-09-20)

  • introducing kitty

  • improving the documentation a lot

2.1.2 (2021-09-03)

2.1.0 (2021-07-16)

  • new model introduced SuperCTM

  • new model introduced β-CTM

2.0.0 (2021-xx-xx)

  • warning, breaking changes were introduced:
    • the order of the parameters in CTMDataset was changed (now first is contextual embeddings)

    • CTM takes in input bow_size, contextual_size instead of input_size and bert_size

    • changed the name of the parameters in the dataset

  • introduced early stopping

  • introduced visualization with pyldavis

1.8.2 (2021-02-08)

  • removed constraint over pytorch version. This should solve problems for Windows users

1.8.0 (2021-01-11)

  • novel way to handle text, we now allow for an easy usage of training and testing data

  • better visualization of the training progress and of the sampling process

  • removed old stuff from the documentation

1.7.1 (2020-12-17)

  • some minor updates to the documentation

  • adding a new method to visualize the topic using a wordcloud

  • save and load will now generate a warning since the feature has not been tested

1.7.0 (2020-12-10)

  • adding a new and much simpler way to handle text for topic modeling

1.6.0 (2020-11-03)

  • introducing the two different classes for ZeroShotTM and CombinedTM

  • depracating CTM class in favor of ZeroShotTM and CombinedTM

1.5.3 (2020-11-03)

  • adding support for Windows encoding by defaulting file load to UTF-8

1.5.2 (2020-11-03)

  • updated sentence-transformers version to 0.3.6

  • beta support for model saving and loading

  • new evaluation metrics based on coherence

1.5.0 (2020-09-14)

  • Introduced a method to predict the topics for a set of documents (supports multiple sampling to reduce variation)

  • Adding some features to bert embeddings creation like increased batch size and progress bar

  • Supporting training directly from lists without the need to deal with files

  • Adding a simple quick preprocessing pipeline

1.4.3 (2020-09-03)

  • Updating sentence-transformers package to avoid errors

1.4.2 (2020-08-04)

  • Changed the encoding on file load for the SBERT embedding function

1.4.1 (2020-08-04)

  • Fixed bug over sparse matrices

1.4.0 (2020-08-01)

  • New feature handling sparse bow for optimized processing

  • New method to return topic distributions for words

1.0.0 (2020-04-05)

  • Released models with the main features implemented

0.1.0 (2020-04-04)

  • First release on PyPI.