For this project, I wanted to see how the titles of news articles about bitcoin affect its price. I analyzed 500,000 words from 50,000 article titles related to bitcoin. In order to do this I needed a few things:
- A list of the past three years of Bitcoin prices down to the minute
- An RSS feed that had a long history of articles
- A database to quickly summarize data
Now that I had my dataset, I started by parsing all of the data from the CSV file into my program. After that, I parsed the RSS feed for the last 50,000 articles about bitcoin. The titles of all 50,000 were then divided up into their individual words.
I used the timestamps of the articles in order to find the same timestamp in the parsed CSV file. Then I added specific amounts to the timestamp in order to find out the difference in price after the article was posted. I did this for 5-minute intervals all the way up to 55 minutes. This data was being appended to an excel file as it was running through the articles.
Next, I opened up this excel file with over 500,000 records in Microsoft Access. I ran a query on this data in order to consolidate all of the words that were the same and get an average price change for each word. After running the query, I was left with about 26,000 records in the dataset. I immediately noticed some interesting numbers. For example, whenever the word “CEO” was mentioned in an article, the price of bitcoin went on a downward trend.
After that, I exported these records back to excel. This is the excel file that I used in order to make price predictions for new articles.
The final python program gives users a few choices, they can:
- Turn on an RSS monitor that will notify them whenever a new article comes out, as well as give them a price prediction based on the title of that article.
- View the 15 most recent articles posted online about bitcoin, and see what the price predictions were for those.
- Enter their own title/keywords and see what the price predictions would be.
View the full code on GitHub