AT&T Inc. is an American multinational telecommunications holding company headquartered at Whitacre Tower in Downtown Dallas, Texas. It is the world's largest telecommunications company by revenue and the third largest provider of mobile telephone services in the U.S. As of 2022, AT&T was ranked 13th on the Fortune 500 rankings of the largest United States corporations, with revenues of $168.8 billion! 😮
One of the main pain point that AT&T users are facing is constant exposure to SPAM messages.
AT&T has been able to manually flag spam messages for a time, but they are looking for an automated way of detecting spams to protect their users.
Your goal is to build a spam detector, that can automatically flag spams as they come based solely on the sms' content.
To start off, AT&T would like you to use the folowing dataset:
Dowload the Dataset
To help you achieve this project, here are a few tips that should help you:
A good deep learing model does not necessarily have to be super complicated!
You do not have access to a whole lot of data, perhaps channeling the power of a more sophisticated model trained on billions of observations might help!
To complete this project, your team should:
- Write a notebook that runs preprocessing and trains one or more deep learning models in order to predict the spam or ham nature of the sms
- State the achieved performance clearly
Python 3.12.1
For a project like ATT, it might look something like this:
[ ] find a good LLM model to handle spam
[ ] train it to improve it with the data
[ ] Try and compare with best in class model @ Hugging face
-
What tools do you have to use? Lightning IA and Colab to train the model ,and the fine tuning.
-
What processes do you need to put into place?
- Select pretrained model
- Fine tune our models
- evaluate it
-
What questions do you need to answer? Spam or ham ? that is the question.
-
What problems do you need to solve? toileting the data Fine tune
-
What specific files do you need hand in for the certification? Notebook and a result brief about the evaluation of the models.