Skip to content

lusparon332/sentiment

 
 

Repository files navigation

Build

Sentiment

Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file lib/sentiment. To experiment with that code, run bin/console for an interactive prompt.

TODO: Delete this and the text above, and describe your gem

Installation

Add this line to your application's Gemfile:

gem 'sentiment'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install sentiment

Usage

To vectorize a list of word you need:

• If you want to vectorize by TF, use clcTF(list) method. list - is list of words from text. Method will return hash, which look like {word1 => TF(word1), word2 => TF(word2), ... } without duplicate;

• If you want to vectorize by IDF, use clcIDF(list) method. list - is list of words from text. Method will return hash, which look like {word1 => IDF(word1), word2 => IDF(word2), ... } without duplicate;

• If you want to vectorize by TF-IDF, use clcTFIDF(list) method. list - is list of words from text. Method will return hash, which look like {word1 => TF-IDF(word1), word2 => TF-IDF(word2), ... } without duplicate.

1)to count frequency of each word in corpus you should use the first function - words_in_corpus_frequency

Example :

input corpus is [%w[кот собак кот],%w[лаб контрольн контрольн семинар]]

output value : dictionary_frequency ( or hash )

dictionary_frequency = words_in_corpus_frequency(corpus)

dictionary_frequency is {
  'кот' => 2,'собак' => 1,'лаб' => 1,'контрольн' => 3,
  'семинар' => 1 }

2)to delete words with low (their frequency is lower than min_freq - the third argument) and high (their frequency is higher than max_freq - the fourth argument) frequency from input corpus (the first argument) using dictionary_frequency (the second argument) you should use second function - delete_words_with_high_and_low_frequency

Example:

input corpus is [%w[рыбак актёр крокодил],%w[крокодил крокодил крокодил актёр],
%w[актёр рыбак крокодил крокодил],%w[крокодил крокодил актёр],%w[крокодил актёр крокодил],
%w[крокодил крокодил крокодил]]

dictionary_frequency = words_in_corpus_frequency(corpus) ( look the first function )

min_freq = 0.2 (the words with frequency < 0.2 should be deleted)

max_freq = 0.9 ( we should delete 1-0.9 = 0.1 = 10 % part of word with highest frequency - only one word )

output value : changed corpus

corpus = delete_words_with_high_and_low_frequency(corpus,dictionary_frequency,min_freq,max_freq)

changed corpus is   [%w[актёр],
                    %w[крокодил крокодил актёр],
                    %w[актёр крокодил крокодил],
                    %w[крокодил крокодил актёр],
                    %w[крокодил актёр крокодил],
                    %w[крокодил крокодил крокодил]]

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/sentiment. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the Sentiment project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Ruby 99.1%
  • Shell 0.9%