What color is deception: 5 days experiment

I've recently had a simple idea of colorizing deceptive reviews based on their sentiment. The principle was quite clear: analyze the sentiment of a review and based on its neutral/positive/negative distribution, colorize the comment accordingly (i.e. blue is neutral, red is negative and green is positive). My guess was that deceptive reviews would look more 'juicy', (I was still not sure at the beginning). I worked on this project consistently during five days on my spare time and published daily updates on my LinkedIn page. Here's the summary of my publications and some results. I hope you will find it interesting. Up we go!

Day 1: Define the idea, prepare the data

The idea of this project was quite straightforward. Suppose we've got a truthful and a deceptive review. Both of them are of the same polarity (positive or negative). However, as most of the researches claim, the deceptive reviews may have some underlying patterns which are visible to a machine, but not to a human reviewer.

In some cases, sentiments are exaggerated (e.g. I looooved the service, it was just BRILLIANT, etc). Consequently, what if we were able to colorize the comments and at one glance be able to evaluate the level of exaggeration?

To analyze the sentiments we've used the Azure Text Analytics API which allows us to get different aspects of a sentiment, like neutrality or negativity, besides analyzing mere overall sentiment, which may not always be uniform, as some reviews are composed of negative and positive aspects.

For the experiment we used Ott Deceptive Opinion Spam corpus, which is the gold-standard dataset, commonly used to benchmark different machine learning models, aiming to automatically detect deception in user reviews.

For the convenience, I've created a set of helpers allowing to rapidly get a needed subset of Ott Corpus. Something like this

def read_ott():
    try:
        ott_path = '../local_datasets/deceptive-opinion.csv'
        ott_deceptive = pd.read_csv(ott_path, encoding='utf-8', sep=',', engine='python')
    except FileNotFoundError:
        ott_path = './local_datasets/deceptive-opinion.csv'
        ott_deceptive = pd.read_csv(ott_path, encoding='utf-8', sep=",", engine='python')
    return ott_deceptive


def get_ott_negative():
    ott_dataframe = read_ott()
    ott_dataframe_negative = ott_dataframe[ott_dataframe['polarity'] == 'negative']

Day 2 : First results

Well, here's the result of the very first run of the project (was very excited while launching the experiment). And this is what it looked like! The image below represents a collection of 400 deceptive (fake) hotel reviews of a negative polarity.

The users were to leave negative comments and these reviews were collected via Amazon Mechanical Turk (Ott et al.). To get the sentiment, I've created a cognitive helper:

def get_overall_sentiment(phrase):
    documents = [phrase]
    response = text_analytics_client.analyze_sentiment(documents, language='en')
    result = [doc for doc in response if not doc.is_error]
    output = result[0]

    positive = output.confidence_scores.positive
    neutral = output.confidence_scores.neutral
    negative = output.confidence_scores.negative

    return positive, neutral, negative

Each pixel is a review, and depending on the sentiment, it is colorized in a BGR format (blue, green, red and not rgb, because I am using OpenCV). As you see, this is what internet trolls look like for a computer! The next step is to run the same ML model on truthful negative reviews.

P.S. Surprisingly, there are some green pixels

Day 3: Deceptive Reviews vs Constructive Criticism

After the previous run it is was still unclear to me whether the deception can have a color. On the 3rd day, however, I was more positive about this as there was something to compare. So, I had launched the same pipeline on the truthful negative reviews and we could observe some evident differences.

As you may notice from the two pictures above, deceptive reviews are brighter, with considerably fewer green spots. This definitely proved that there was some exaggeration in fake comments, so the next step was to verify whether the same pattern may be observed in positive reviews.

N.B. to create an image out of sentiments array in BGR format, I've done the following : first converted the sentiment to a BGR format and then generated an image out of these values

def sentiment_to_opencv(overall_sentiment):
    positive, neutral, negative = overall_sentiment
    blue = ceil(neutral*255)
    green = ceil(positive*255)
    red = ceil(negative * 255)

    return blue, green, red


def generate_image(reviews_bgr, image_name):
    if len(reviews_bgr) != 400:
        return False
    image = np.zeros((20, 20, 3), dtype=np.uint8)
    counter = 0
    for i in range(20):
        for j in range(20):
            image[i][j] = reviews_bgr[counter]
            counter += 1
    return cv2.imwrite(image_name, image)

Day 4: Grass is always greener on the other side of the fence

After such promising results of the previous experiment, it sounded sensible to launch the same pipeline on positive reviews (deceptive and truthful). And what we see here?

The colors are much brighter with less red spots. We can literally see how certain services are being falsely flattered. On the other hand, truthful reviews tend to be more realistic as nothing is perfect in our world. There was definitely something interesting in this. The next step was to centralize these pixels and give a uniform color which actually represented the deception.

Day 5: Lie to me

At the beginning of our journey we said that we wanted a color, not many colors. So generalizing/averaging all the pixels into one uniform color gave surprisingly interesting results.

When we put all the pixels on a canvas it becomes evident, that we can definitely find the exaggeration in deceptive comments using NLP tools, such as Azure Sentiment Analysis for example. And we can literally see it! As you may have noticed truthful negative reviews are not as red as deceptive ones. And even fake positive reviews are greener than the truthful comments.

N.B. Averaging the pixels was quite straightforward, simply split a pixel into three channels and then merge them back after finding the average.

def mix_colors(reviews_bgr, image_name):
    average_blue = 0
    average_green = 0
    average_red = 0
    for blue, green, red in reviews_bgr:
        average_blue += int(blue)
        average_green += int(green)
        average_red += int(red)

    average_blue = ceil(average_blue/len(reviews_bgr))
    average_green = ceil(average_green / len(reviews_bgr))
    average_red = ceil(average_red / len(reviews_bgr))
    image = np.zeros((150, 150, 3), dtype=np.uint8)
    image[:] = (average_blue, average_green, average_red)

    return cv2.imwrite(image_name, image)

It was a fascinating 5 days journey. I have learned a lot and it seems to me now that this is the beginning of something interesting. Hope it was interesting for you too. If you want to experiment yourself, you can find the code here. As you will need a Text Analytics API, the best solution is to create a free Azure account and start playing with cognitive services. However, if you find it complicated or want just to give it a try without buying it, feel free to contact me by email, I may provide a temporary key to you, so you can play with the service. May the force be with you!

6 Comments

Selfhelp Fitness

Jan 23

Your blog is such a valuable resource for personal growth and encouragement! This post was another fantastic reminder to stay focused and embrace every challenge with confidence. Thank you for your thoughtful insights and your commitment to helping others succeed. Your blog is a true gem, and I always look forward to your updates. Keep inspiring—you’re making a significant impact!

Jeniffer Alison

Dec 10, 2024

Visit : end mill cutter

Jessica Evans

Jul 24, 2024

Great post! It’s fascinating how colors can influence our perceptions in such subtle ways. Just like how the psychology of color can affect decision-making, choosing the right tools in real estate management can also make a significant difference. For instance, real estate channel partner management CRM solutions with 4QT offer advanced features that can help streamline operations and enhance partner relationships. It’s all about finding the right blend to optimize both your marketing and management strategies!

Jul 22, 2024

Fascinating read on the psychology of color and deception! Just as the experiment highlights the impact of color perception, choosing the right DIY tools can make a significant difference in your project outcomes. At Industrial Cart, we pride ourselves on offering tools that deliver 'Unmatched Quality, Superior Finish.' Our range of DIY tools ensures that your projects not only look great but are completed with the precision and durability you need. If you're looking for reliable tools to elevate your DIY endeavors, check out what we have to offer!

Claire Losterbien

Jun 10, 2024

It's like they say that red tarot cards represent passion and fire energy but they also have a calming effect. So we can't take colors so literally.