Our virtual companion chatbot leverages a sophisticated cloud architecture that combines various AWS services to deliver a scalable, reliable, and efficient solution. This article deep dives into the technical infrastructure that powers our emotional intelligence system.

Cloud Architecture Diagram showing the complete system design

Building a Cloud-Powered Virtual Companion: A Deep Dive into Emotionally Intelligent Chatbots"?

Abstract

This research introduces a novel virtual companion chatbot designed to provide emotional support and foster social interaction. The chatbot employs a dual-stage process to predict user emotions:

Analyzes textual input to assess emotional state
Analyzes audio files to capture vocal cues

These outputs are combined into a cohesive emotional profile, which is then processed by a Large Language Model (LLM) within a predefined template. This ensures the chatbot's responses are emotionally aware, supportive, and non-harmful.

Implementation Methodology

The system architecture involves several key components:

Real-time audio analysis and response production
Amazon EC2 instances for hosting Node.js and FastAPI applications
Amazon S3 buckets for storing uploaded audio files and intermediate processing results
Integration with external LLM APIs (e.g., OpenAI's GPT or Anthropic's Claude) for final response generation

Current Implementation

Uses Amazon EC2 instances for hosting applications
Implements auto-scaling groups across multiple availability zones
Utilizes Amazon S3 for data storage
Integrates with external LLM APIs
Uses Application Load Balancer (ALB) for traffic distribution
Configures VPCs and security groups for network management

Future Optimizations

Transition to serverless and edge computing technologies
Replace EC2 instances with AWS Lambda functions
Implement API Gateway for request management
Utilize AWS Fargate for serverless container orchestration
Leverage AWS CloudFront with Lambda@Edge for edge computing
Implement additional AWS services:
- Amazon Comprehend for sentiment analysis
- AWS Step Functions for workflow orchestration
- DynamoDB for data storage
- ElastiCache for distributed caching
Use AWS X-Ray for distributed tracing
Implement CloudWatch for logging and alerts
Utilize AWS Budgets and Cost Explorer for cost management

Sentiment Classification on Audio Files

The system uses the following features for audio sentiment classification:

39 MFCC (Mel-frequency Cepstral Coefficients)
Zero Crossing Rate (ZCR)
Teager Energy Operator (TEO)
Harmonic to Noise Ratio (HNR)

Model Architecture

The sentiment classification model uses a divide-and-conquer approach with four binary Support Vector Machine (SVM) classifiers:

Angry vs. Sad (94.49% accuracy)
Angry vs. Happy (72% accuracy)
Fear vs. Sad (71.05% accuracy)
Happy vs. Sad (86.93% accuracy)

The overall architecture achieves approximately 60% accuracy on the entire dataset.

Sentiment Analysis of Text Using BERT

The text sentiment analysis component uses BERT (Bidirectional Encoder Representations from Transformers) and involves the following steps:

Dataset preparation and tokenization
Model initialization (BertForSequenceClassification)
Training loop with AdamW optimizer
Evaluation using metrics like accuracy and F1-score

Integration and Response Generation

The final stage integrates predictions from both text and audio sentiment analysis models. The identified sentiments are structured into a prompt template, which serves as input for the Large Language Model (LLM) to generate the final response.

Conclusion and Future Work

The project demonstrates the potential of multi-modal sentiment analysis in creating emotionally intelligent chatbots. Future work will focus on:

Enhancing the accuracy of the audio sentiment analysis model
Expanding and curating a more representative dataset
Fine-tuning the Large Language Model for more precise and contextually appropriate responses

This approach paves the way for more empathetic and human-like digital companions in the future.