Investigation of news trends and bias

with an emphasis on New York Times

Row down


This dataset is used to investigate the bias in media coverage. Throughout history, media sources have played a significant role in influencing and shaping public opinion. Nonetheless, they have historically favored the interests of a few over the people. The media is frequently utilized as propaganda by authoritarian regimes and even to determine the outcome of democratic elections. With the growth of social media, it becomes even more critical for us to ensure that our news organizations cover the entire spectrum of news in an unbiased manner. Thus, in our study, we seek to elucidate this more. Our project's objective is to ascertain the differences in themes and attitudes between New York Times and other vendors. What distinguishes it from others, and are they overrepresented in a particular segment of society.

Data exploration

Examine all of the data contained in quotes to have a better understanding of the quotes and determine what we can do to analyze and bias the trend.

NewYork Times

Focus our work mainly on the quotes in NewYork Times to analyze the trend of it and make some comparision with other vendors.

Expand the Timeline

Explore the NewYork times from 2015 to 2020, and make a conclusion of the bias based on the previous work.

Data exploration of the quotes data

Before we conduct an assessment of the New York Times data, we’d like to examine all of the data contained in quotes. Thus, we may gain a better understanding of the quotes and determine what we can do to analyze and bias the trend. We focus our work mainly on the trend of topics, speakers, vendors and also explore the NLP method on topic detection and sentiment analysis.

General exploration for all vendors

We begin by examining the data to determine the relationship between the topic and other characteristics such as vendors and speakers. For the former, we'd like to have the topics discussed with the vendors who primarily deliver them. Additionally, the same conditions apply to the speakers and themes; in this approach, we may observe which topics are preferred by certain individuals. This manner, we can get a sense of the vendors' and users' preferred topics. After examining the themes and sentiments covered by the NewYork Times, we may utilize this data to determine whether the NewYork Times has a bias.

1.1 Exploration for connection

In this section, we try to visualize how different topics of New York Times are connected to different vendors and speakers. We notice that in the given dataset, we have the urls of the quotes. The URLs look something like this: . It is easy to see that the URLs of NYTimes follows a set pattern from which we can extract topics like real estate in this case. To understand how we do this further, please refer to the uploaded code.

The following graph indicates which news outlets quote similar to different NYT topics. The green circles represent the topics and blue circles represent the websites.

Vendor-topic exploration


The following is a similar graph of speakers with topics.

Speaker-topic exploration


1.2 The trend of topics and sentiments

The picture below is an example for the topics trend overtime in 2017. The topics is extracted by BERT in natural language processing . First, we could find the top 10 topics in 2017 concerned by people. Among them the topic related to the student-teacher-school is most attracted, followed by the topics of films and movies. The music is the third largest topic in 2017.

We observe the trends related to the major events of the year. While there are some omnipresent values such as movies, sports, education, we noticed that during the election years and in the aftermath, politics became more popular(eg. Trump,white house etc.). Similarly, for 2020 there were many more articles related to health, covid disease etc. We also see that the number of articles with positive/negative sentiment are not too different, which shows that the overall sentiment is somewhat neutral.

The following is an interactive map for us to find the topics and it's popularity.

The sentiments analysis

We visualize the words related to the sentiment, and divide them into four groups. The positive, negative, netrual and compound sentiment words. The following picture show the positive and negative words we extract from 2017.

some_text some_text

Afterwards, we use Flair Sentiment Analyzer to compare the count words related to this two sentiments. We could see that the positive sentiment is higher than the negative sentiments.


1.3 The trend of topics with speaker

The following picture shows the most frequent topics and the example of distribution on speakers' profession in these topics.

We can remark that the "politician" occupation is over represented in a global scope and in the most frequent topics. But we also observe that the media are not biased to the most famous politicians eg. Trump. We can conclude that most quotes are related to politics and the government.

some_text some_text

New York Times feature analysis

Besides the general investigation, we also focus on the NewYork Times data in order to make comparision for the topic trends as well as the sentiment conditions.

General research on New York Times

We take the same method in the general analysis to data in NewYork Times, with the purpose to compare the difference between them. The chart below illustrates the topic trend and mood surrounding the newyork times quotations.

2.1 Topics and speakers analysis

Here is the topic trend for quetos provided by NewYork Times 2019. Some top topics are the same as that in general research, while there left some difference we could discover. Rather than other vendors, NewYork Times focus more on the politics, religion, for it give more reports on gender, China, church. While some livelihood issues such as traffic, agriculture are attached less importance on NewYork Times report.

2.2 Sentiment elements analysis

For the sentiment words extracted, the positive phases for NewYork Times are similar to those in general vendors. For negative words, it is obvious that other vendors express more on the violence and crime, while NewYork Times concerned more about racism and war. As we conclude before, NewYork Times tends to be more political. The example of the sentiment trends is shown in the following pictures.

some_text some_text


Preference of NewYork Times and the bias

We estimate the preference and bias from topic, speakers, sentiments-topic distribution, and speaker based popularity of topics.

Preference and Bias analysis for NewYork Times

3.1 The topic preference investigation

The following is a list of Topic preferences for the New York Times as compared to the general dataset. We can clearly see that New York Times is more or less at par with the general consensus on the different topics.


3.2 The speaker preference investigation

This is a list of Speaker preferences for the New York Times as compared to the general dataset. We see a similar pattern as with the case of Topic preference that the most popular speakers of NYT are the same as the general consensus.


3.3 Sentiment bias analysis

In our sentiment analysis, we randomly choose 200,000 quotes for New York Times as well as the general dataset. We see that NYT is at par with the general distribution of positive and negative sentiment. Further, it can be seen that both the distributions are unbiased in terms of sentiment.

some_text some_text

3.4 Topic-sentiment bias analysis


To further analyse NYT, we extract the topics of NYT articles from their URLs (to keep the topics in the format NYT keeps). We notice that even at the topic level, NYT is fairly unbiased/slightly positive biased across all the different topics.

3.5 Speaker-wise Analysis of Popularity


Now that we have established that NYT like all the general media outlets is quite unbiased in it’s reporting. We would like to observe whether NYT over-represents the interests of a few individuals. To do this, we perform a small experiment. We define the popularity of a quote by the number of media outlets that have used a particular quote. Hence, if there are 30 URLs with the same quote, we would define the popularity of the quote as 30.

We would like to observe whether the quotes given by the most influential speakers become a lot more popular than the average. To do this, we plot the average popularity of a quote given by a top 10 speaker for a particular topic to the average popularity of a particular topic. We notice that for topics like Sports and Arts, the average popularity of influential speakers is the same as the overall popularity. But the popularity of the most influential speakers in topics like world affairs, business and opinion is a lot higher. Hence, we notice that while media outlets are unbiased when it comes to sentiments of their reports, they are heavily biased towards a few individual speakers.




General Exploration on topics and sentiments


Focus on the analysis for NewYork Times


Extend the timeline for data analysis

Qing Jun

Website making

Contact Us

Lorem ipsum or call 123456789