Korean popular culture analytics in social media streaming: evidence from youtube channels in Thailand

A social network consists of a selection and interconnections of members [1]. In particular, culture and nature form the foundation of a social network; however, most relations in real-life social networks remain hidden. Such connections have become transparent, open, and exploitable in the web era with social networking platforms, such as Facebook, Twitter, Instagram, and YouTube [2]–[8]. Details of the network can also be collected and evaluated [9][10]. The network can be interpreted as a graph with nodes representing network actors and arrows suggesting relationships between interactors. Social Network Analysis (SNA) aims to reveal patterns within network systems and research actors to detect the effect of those patterns on individuals and organizations [11]. The focus of information scientists and experts in social networking has increasingly shifted towards enhancing people's motivation experiences through social networks for information purposes. Social networking provides opportunities to connect and share information to address problems and increase awareness, and SNA gives new insights to analyze the network structure [12]–[19].


Method
This study applies SNA to transform social media streaming information into knowledge. It provides insights into the critical role of social media in the distribution of alternative information. The steps of this study are: 1) Collection of data from K-pop YouTube channels between June 9 and July 22 2020; 2) Exploration of links and patterns generated in comments and replies through combined interactions on the K-pop YouTube channels; 3) Identification and Analysis of social roles and social network key players; 4) Analysis of comments on videos such as discussions, top words, pairs of words, and multiple correlations within the K-pop YouTube channels; and 5) Suggesting further action to make information distribution through social media streaming more effective.

Analysis of the relationship between all YouTube channels
We began by looking for YouTube channels that are important to Popular Korean culture. We decided to use the keyword "Korea", extracted searches from all over the world, implemented an iterative binary searching technique and ranked corresponding resources according to their search relevance. Following the investigation, we discovered that there were 1,241 network-correlated YouTube channels. We, therefore, used Gephi software for SNA. The tool used for extracting data is named YouTube Data Tools (https://tools.digitalmethods.net/netvizz/youtube/).

Prevalence and Patterns of Korean Popular Culture YouTube Channel
Gephi's accelerated "crawling" of the K-pop YouTube community (Data extraction) led to 5,356 channels being manipulated. The resulting data collection of 5,356 channels were "cleaned" by deleting messages not applicable to critical research mentions. There were 1,241 distinctive vertices and 5,061 edges in the network of famous channels within the K-pop YouTube community. The edges of the survey included original posts, comments, and references which were all guided. The representation of complex networks provides an understanding of the flow of form or information [28]. Gephi has been implemented from the outset to navigate complex networks quickly and intuitively. The structural design supports graphs with a structure or object that differs over time and provides a schedule portion containing a network segment. The machine tests all nodes and edges in the timeline slice's time span, which coincides with the view module, and updates it. Thus, it is possible to perform a complex network as film sequences. The dynamic module may receive network data from a graph file or external data sources. During execution, a data source will still transmit network data to the dynamic controller and then immediately display the effects in the displaying module. For example, to see the network construction over time, a web-crawler can be linked to Gephi. The architecture is interoperable and can be conveniently generated for collaboration with current applications, databases, or online servers by third parties [29]. Consequently, Table 1 includes an overall description of the case graphs.

Performance from impacts and network analysis
This section focuses on the K-pop YouTube community's internal communication and social network scale. It also reports on the characteristics of each vertex, dependent on in-degree and outdegree, closeness, betweenness, and eigenvector centrality [30]. The SNA algorithms were developed as shown in Table 2.

In-Degree and Out-Degree Centrality Results
The in-degree value is the number of K-pop YouTube channels that replied to or mentioned other K-pop YouTube channels. The top three vertices had over 100 arrows pointing to them, based on the in-degree values created by the Gephi statistics. In this investigation, the top three most common networks listed from top to bottom were (1) UCs2t9DlvAuyTBkvk2djGZKw (Genierock) -an in-degree of 71; (2) UC3ZkCd7XtUREnjjt3cyY_gg (WorkpointOfficial) -an in-degree of 65; UC8f7MkX4MFOOJ2SerXLInCA (one31) an in-degree of 60; and (4) UCjVb9mJZTeuPLN4hOjmvQnw (Bie The Ska) and UC4plRabXFGdAE6HP-tBQKdQ (HEARTROCKER) in-degree of 49. Genierock, WorkpointOfficial, and one31 thus seem to be the most prominent sources in this investigation. The rest of this social network's members perform various "in-between" roles.
Success is not a single indication of effect in a social media network. The important medium (outdegree centrality) was considered in this investigation. Second, there were only ten channels on the Korean YouTube Channel community that communicated specifically with UCMz oAF3aplb-2sF8DHiNzA (ซี รี ย ์เกาหลี ยอดฮิ ต). When the YouTube channels were rated out-of-degree, however, the top YouTube channel was UCh3HXrUrB6V0XtZbGWCFGlQ (เกาหลี เกะกะ까오 리 께까) -appearing as a citizen from the YouTube channel info. What this also means, however, is that this is a popular account, which is very outspoken and cites several others in the K-pop YouTube channel debate within the website. The out-degree of a YouTube channel measures the network's reaction to the channel. It measures the number of arrows pointing away from the channel or the number of channels it reacts to. When the authoring channel talks to others, it has the effect of putting them on the route or reaching out to them for a second time, even if they were already in the network. An account that is interested in its growth shows that interest to others.  Data on Closeness Centrality. According to the methods in this article, closeness centrality is calculated by summing up the total distance to all nodes, and then all nodes are given a rank according to their shortest path distance. In this kind of centrality, we identify the more prominent persons with the ability to influence the network as a whole better. Thus, the centrality of closeness will help identify 333 Vol. 7, No. 3, November 2021, pp. 329-344 good 'broadcasters' in a social network. Of the 1,241 Korean YouTube Channels culture, just 11.36 percent (141) of users scored a comparable score of 1, while 88.64 percent of the overall Korean YouTube Channels culture users scored 0. Therefore, it can be deduced in this inquiry into the K-pop YouTube community that network communication is greatly linked in a complex way.
Betweenness Depends on Centrality. Fig. 1 reflects the findings of the inquiry into the K-pop YouTube community's centrality of betweenness. This analysis demonstrates that YouTube channels in the K-pop industry play an essential role in bridging the network by following the shortest pathways and calculating how many times each vertex occurs. The Faster technique for betweenness centrality [31] was used to represent the graph-distance between all pairs of nodes for the Network Diameter, which comprised betweenness centrality, closeness centrality, and eccentricity. Information spreads on YouTube through relatively short paths. Then, those YouTube networks, on short paths, monitor the distribution of information via the social media network. Thus, YouTube channels with several short paths have high centrality of betweenness, and are considered powerful gatekeepers of information. In the case of K-pop YouTube channels, the YouTube channels with the highest betweenness centrality was that of UC5zGJZpxeZPFcds5gFcDE7Q (zbing z.) followed by UC9VmaM36k50ZLGkFL5Zh_pg (Aueyauey เอ๋ ยเอ ้ย) and UCQp9wDCeuLrRs-u7mC0QZVQ (Bstars Music). YouTube channels described in the topic above are ranked by the in-degree centrality. Thus, these three YouTubers can be counted as the most famous and the most prominent social network accounts in the K-Pop YouTube community.

Fig. 1. Betweenness Centrality distribution
Evidence of the Centrality of Eigenvectors. Eigenvector centrality is known to be a form of the centrality of a "higher degree." A YouTube channel with fewer links may have a very high centrality of its vector with Eigenvector centrality. This means that it is more advantageous to connect to some vertices than a relation to others. However, those few connections need to be very well connected to make high variable value connections. In the investigation of K-pop YouTube channels, the highest scores for eigenvector centrality were that of the UCs2t9DlvAuyTBkvk2djGZKw (Genierock) followed by UCOmHUn--16B90oW2L6FRR3A (BLACKPINK) and UCjVb9mJZTeuPLN4hOjmvQnw (Bie The Ska), which provides adequate evidence that suggests certain Korean YouTube channels within the community is more helpful to other social network users than others.

Data Analytics and Visualizations
The socio-gram layout is shown in groups in Fig. 2. A given algorithm clusters the groups. The relative density of the network classifies these groups. These clusters tend to integrate vertical groups with high network capacity. These clusters may be merged. Additionally, this applies to network customers that are overly clustered and obsolete. These network members are also known as influencers for the network. These groups also promote the clustering of network users with a lower network capacity and ignore them as discrete instances that are not necessary for the simulation of clusters because they don't interact on the network with anyone. The Clauset, Newman, and Moore algorithms [32] International Journal of Advances in Intelligent Informatics ISSN 2442-6571 Vol. 7, No. 3, November 2021, pp. 329-344 were used to show the relations between these vertices for this analysis and visualization. This algorithm uses modularity as a network property to set up an intercommunal network.
In the case of YouTube channels in K-pop culture, Gephi created 31 classes. The groups were arranged to show the isolates in a single group in separate boxes. Then Gephi calculated the clusters based on the criteria used in selecting the groups [33]. The resulting socio-gram (Fig. 2) shows the clusters in separate boxes with connections to particular clusters across a selection of colors. The isolates are located in different boxes at the figure's top and bottom right-hand corners. Based on its noncommunication in the network, these isolates struggle to affect the overall visualization, which is also the reason why, in the figure, the relations are seen in a circular shape. Even care should be made of the coordination between the parties. From Fig. 2, with connections to several other nodes in the social network, the main clusters are clustered to the left. It is possible to argue that the principal drawback of this analysis is that it falls short of a certain degree of minimal effects. In fact, this applies to the apparent lack of public contact in social media with the case of YouTube channels in the K-pop culture. This apparent lack of popularity on social media, especially on YouTube, may at the time have been affected both globally and locally by many other internet news events. At the same time, information overload is a reality that will not change shortly, particularly from social media. This will thus give further analysis the opportunity to examine social media users' dispersion, especially those explicitly influencing individuals, during essential cases.

A specific analysis of Korean popular culture
From the global community of K-pop YouTube channels as presented in the findings of phase 1, we selected a total of ten channels, eight of which are Thai-based, for further analysis in this section. The criteria for selecting these channels were: a) that only K-pop content was presented, b) that the channel had more than 200,000 followers or subscribers, c) that up-to-date news alerts were given. The collection and consensus results are shown in Table 3.  Then, we selected five Thai-based channels and sorted the remaining information according to the findings in Table 3 to further examine common issues related to K-pop and evaluate K-pop's impact on Thai society.

Data collection
Since our analysis indicates that the classification cannot adequately be distinguished from the subjective existence of particular questions, our approach relies on the discovery -and time-consuming variance of search results -for a series of chosen questions. We utilized the YouTube Data Tool 'Video List' module, which uses the 'search: list' endpoint of the YouTube API v3.4, to use the 'order' parameter on the Web platform and smartphone apps for user-focused search and hence our analysis explored the basic rating process. The capture module was then immediately called on for the top 300 results from June 9 to July 22, 2020, for 44 days simultaneously. The complexities arising from this unique configuration are further elaborated in this paper's subsequent sections. In the absence of a simple methodological equilibrium, the researchers must creatively handle the theoretical possibilities and limitations. We understand that location and personalization on YouTube will play an important role, but we keep the API data as close as possible to a 'baseline' point of view.
Our data collection method was used in the manual selection of the following five YouTube ID queries: . This collection reflects on current and contentious topics, of which it is possible to say that the search results are linked to politically motivated controversies and that they are discussed. This preference is supported by the practice of depicting conflicts in 'situations where actors differ, "and the development of social life can be seen through a high-contrast lens [34]. While extending our approach to less-heated YouTube parts such as interviews would be important, we were primarily searching for questions where we could fairly predict substantial improvement over time in response to recent incidents. Except for Gamergate, which had settled before we started gathering data, all words suggested ongoing disputes which have been perforated and deeply divisive by current events, which made an overview of the grade shift especially prominent. Our selection has also mirrored the three authors' current experience and allowed us to address the data with more trust. We understand that our ad hoc inquiries do not encourage us to make general statements about YouTube cultures but instead help us develop and test a technique that we intend to apply in future research systematically.
We also focused on Google Trends data, a database that offers statistics on the search volume for four Google sites over a predefined period to contextualize the rankings we also collected with a metric of public interest: site search, product search, news, and YouTube. If the given data is not true search figures but averaged to the maximum value for the period, the variance measurement is endorsed over time. In each of our case studies, we were able to examine search volume variations on YouTube.
For the ethical component of our research, we did not want to anonymize channel identities, either in our fundamental data or for publishing purposes. Contrary to YouTube, the use of real names is not mandated; most notably, the questions we selected did not extend to private spaces where privacy standards pose practical issues but to networks with multiple subscribers addressing a broader audience. While YouTube maintains social networking sites elements, our queries highlighted a vast array of networks that subscribe to a broadcasting framework that is in several respects reminiscent of speech radio. The public interest in understanding this new media environment is essential.
Data is collected from the YouTube API for this study. A script is scheduled for running every day during the year 2020 and retrieves data from YouTube API about the popular videos of the day. The script will then process and archive the data obtained from YouTube in text files.
In this article, Thailand will examine trend video clips, which contain a total of 72,994 videos. YouTube usually adds 200 videos into the trend list every day to have 73,000 (365 x 200) videos. We have obtained 72,994, perhaps because the number of popular videos was marginally less than 200 on a couple of days. On YouTube, for several days, the same video could appear on the trend list. We, therefore, have data to examine on 72,994 trend videos. The 72,994 videos are also not special ones. We currently have 11,177 exclusive videos out of the 72,994 videos. In other words, 11,177 videos were featured in the trend list all year long. Some of them could have been on the registry for ten days, others 20 days, etc. We can see more detail on this later in the report.

Data analysis
Python and the great community of Python libraries, including Pandas, Matplotlib, NLTK, ImageAI, WordCloud, and more, performed the data analysis of the present study. The research was carried out in a Jupyter Notebook. Comments posted by followers on video clips in the five YouTube channels were reviewed. The results of the analysis of the five channels are shown below in Table 4. The mean average for the number of views is 110,382.50. This means that one-half of the trend videos hold less views than this figure, while the views for the other half exceed this figure. The average number of people who like a trending video is 4,035.49, while the average number of dislikes is 125.92. The average number of comments is 370.17, while the low is 240.00. We can see that the percentage of videos with less than 390 views is approximately 71 percent, and the number of videos with less than 5 million views is roughly 91 percent.
Currently, we can see that the bulk of pattern videos are 2000 or more with a video peak of 120 or less. Similarly, we can see that the percentage of videos with less than 2,000 likes is around 44.33%. We shall verify that by plotting a scatter plot between Viewscount and Likescount to visualize the relationship between these variables.
We see that Viewscount and Likescount (Fig. 3) are truly positively correlated: as one increases, the other increases too-mostly. We see that most trending videos have around 2000/7 = 286 comments since each division in the graph has seven histogram bins. Similarly, we can see that the rate of videos with less than 286 comments is around 57.48%.

Video title lengths
We shall now add another column to our dataset to represent the length of each video title, then plot the histogram of title length to get an idea about the lengths of trending video titles. We see that title-length distribution resembles a standard distribution where most videos have title lengths of approximately 40 to 65 characters. Then, a dispersion plot is drawn between the title duration and the number of views to see the relation between these two variables. Looking at the dispersion graph, we may conclude that there is no relation between the title duration and the number of views. However, we note an odd trend: videos with 1,000,000 views and more have a title length from about 30 to 70 characters.

Analysis of follower comments in five YouTube channels
As shown in Table 5, the highest number of replies to comments was 199. The comment with the most replies was about a contest hosted by Busted!, which offered incentives in the form of prizes to engage people in the conversation.  Table 6 illustrates that the comment that garnered the highest number of likes at 9,309 was about women.  Fig. 4 shows that, in general, followers of five K-pop YouTube channels comment on most video clips in a pleasing, amusing, and happy manner. If they are teenagers, commenters use words like 'cute', 'funny', or 'fun', as well as expressions that indicate laughter (e.g., 555 or lol) to show that something is amusing or pleasing to them. Moreover, if they are female, commenters use words like "elder" (i.e., the word 'พี ่ ' to refer to someone older than them) and "woman" next to other positive words or descriptors like "beautiful" so as to express their positive attitudes.

Results and Discussion
In the year 2020, a channel by the name of Mink Mink created the most popular videos on YouTube. Content related to product reviews, restaurants, accommodation, tourist attractions, as well as personal vlogs rather than corporate ones, receive the most popularity in the K-pop YouTube community. Furthermore, after Mink Mink, the 2nd and 3rd channels with the most popular videos were entertainment channels. While a video may only be on the trending list for a day or two, six videos were on the list for a total of 44 days. Thep Lee La received the most views, subscribers, likes, and comments. It was, nevertheless, the fourth most hated video. Even when some videos only received approximately 50,000 views, they were included on the trending list, as shown in Fig. 5. In popular video titles, words like "official," "video," "2020," "vs.," "trailer," "music," "game," "new," "highlights," "first," and "challenge" were among the most prevalent. This perhaps suggests that including the current year in the title of a YouTube video will increase its chances of trending. In popular 339 Vol. 7, No. 3, November 2021, pp. 329-344 video titles, the use of emojis were most common. Most popular videos have names that are between 36 and 64 characters long, with a minimum of three and a maximum of 100. Even when there were just seven comments on a video, it still became popular. The most hated YouTube Rewind videos of 2019 and 2020 were the first and second most disliked videos, respectively. Videos take an average of 1.5 days to reach the top of the trending list for the first time. While 5,000 characters is the maximum permitted, most popular videos have descriptions that are between 500 and 1500 characters long. Many popular videos include social network links in their captions. Tags are used in almost all popular videos. On average, there are 21 tags. An object-detection algorithm was applied to the thumbnails of popular movies, and it was discovered that a human is the most prevalent item in the thumbnails. As a result, one might choose to include a personal photo in their next video.
As shown in Fig. 6, videos that fall into the entertainment category (approximately 710 videos) were the trendiest videos among other categories, followed by people & blogs with some 700 videos and satire with some 200 videos, and so forth. Fig. 7 demonstrates that significantly fewer circulating videos are released on Saturdays and Sundays than on other days of the week. Except for Monday, the number of releases on other days of the week is not significantly different from Saturday's. On Monday, only 140 trend videos were released.   The graph shown in Fig. 8 displays the number of trending videos released each day from 12 am to 11 pm, beginning at 0 (00:00 hrs.). We can see that popular videos are mostly published during the period between 9 am to 1 pm. We also observe that the least popular videos are published between 12 am and 8 am and between 2 pm and 11 am. Some possible explanations behind this finding may well be due to YouTube's trending video algorithm or it may be due to the fact that far more videos are released between 2 pm and 7 pm. The two above graphs do not suggest Saturday's videos are more likely to turn into a trend because Saturday may be the day when videos were more frequently posted than on other days. To verify those statements, we also need to know how many videos (not just trending ones) have been released every day of the week in the year 2020. The same holds true for hours of the day.  We shall now compare the video stats mentioned in the previous sections. Views with Likes is directly correlated to view count, and the length of the title is in line with the number of views. A heat map visualization has been created to show the relationships between different measurements. Fig. 10 shows a correlation map, and the correlation matrix shows that views and attitudes are strongly correlated. Lighter shades show a higher positive correlation in this heat map, and darker shades suggest a weaker positive correlation. The highest positive correlation possible is 1, and the highest negative correlation (the lowest positive correlation) is -1.

Conclusion
When we looked at ways to explain YouTube's "relevance" algorithm, we discovered that simple variables like screen size, comment counts, and search results had few clear connections. There is no reason to believe that a more detailed multivariate approach would result in better relations, since (a) the single YouTube variable used to define a factor directly -time to view -is not accessible from the API, and (b) query-dependent statistic variables like 'videos users who are looking for a word' cannot be replicated from the outside. While it would be worth exploring the tremendous functional information gathered by content producers, our highly straightforward approach provided various fascinating findings considering that apparent causal factors remain unchanged. In light of the Thai Government's growing interest in social impact work and the region's economic growth, we were inspired to map the field's academic landscape and analytically examine its development. We utilized SNA, data science, and social science tools to differentiate the development and essence of the K-pop industry in terms of social media platform, sharing, comments, replies, actions, respective collaboration features, and analysis focus areas. The SNA approach helps disclose possible knowledge focused mostly on title video, comment reply, and YouTube channels' presentation, but this does not provide a comprehensive understanding of the complex patterns of growth and interaction in the rapidly expanding entertainment sector. In future studies, we expect to add more data sources and variables, such as Facebook, Instagram, Twitter, and qualitative and quantitative methodologies, to provide a more detailed study of Korean Popular Culture.