Wednesday, February 19, 2025

New Study Shows AI Models Are Not Able to Perform Even the Low-Level Software Engineering Tasks Yet

OpenAI’s CEO, Sam Altman, says that many companies are incorporating AI into their systems but the companies should think before replacing AI with human engineers because it still cannot do a lot of tasks well. Some researchers developed a benchmark called SWE-Lencer to test how well large language models perform when it comes to performing real freelance software tasks. The results of these tests showed that LLMs are capable of fixing bugs but they are not able to understand how these bugs are caused and make mistakes because of this reason.

The researchers tested Claude 3.5 Sonnet, OpenAI’s GPT o1 and 4o with 1488 freelance software engineer tasks from Upwork. All of those tasks were equal to $1 million in payouts. The tasks were divided into two categories: management tasks where the models were asked to act as a manager and choose the best solution and individual tasks where the models were asked to fix bugs and implement features. The results showed the real world freelance software problems were hard to solve even for advanced AI models and that's why they are not capable of fully replacing humans.

The tasks selected by researchers and other 100 software professionals were put into Docker containers without any internet access so the models cannot get the codes from GitHub. After that, the tasks were added to the Expensify platform and the researchers generated prompts based on descriptions of tasks. Playwright tests were used to simulate real-world user flow and the tests were triple verified by professional engineers to ensure that solutions from models worked.

The results showed that none of the models could earn the real value of the tasks given to them. The best performing model was Claude 3.5 Sonnet which earned $280,050 and solved 26.2% of the tasks. All the models performed best in manager tasks which showed that AI models can handle reasoning and technical understanding of lower-level coding problems but they still cannot replace low-level engineers.


Image: DIW-Aigen

Read next: TikTok Leads, Instagram Follows, X Struggles in Post Interactions
by Arooj Ahmed via Digital Information World

TikTok Leads, Instagram Follows, X Struggles in Post Interactions

A new report from Socialinsider looks at average performance of each post on different social networking apps after analyzing 125 million social media posts. The report looked at engagements through comments, likes and other interactions and it was found that TikTok is the best social media app in terms of performance. According to the report, TikTok has an engagement rate of 2.50% which is the most among other social media apps and its engagement is all due to organic and engaging content. In the US, TikTok having the most engagements might not be good news because it can get banned any day from now.

Followed by TikTok is Instagram which has the most post engagements but it still cannot compete with TikTok engagements. Facebook is even farther in post engagements. It can be also because Meta is putting more effort on Reels and is trying to reduce external links on the app. On the other hand, X had the worst post engagement while it was better last year. There also some difference in post engagements of brands who pay for X premium while who don't.

All these engagement rates are based on profile performance, which means likes and comments are counted based on number of followers on social media profiles. TikTok also has more average likes than posts on other social media apps as users are growing more familiar with the app. Posts on X aren't getting that much likes, while Instagram has the most comments after TikTok. This shows which social media apps are getting the most post engagements, with TikTok being the best and X being the worst.

For more insights, check out these charts:

Report Confirms TikTok’s Engagement Dominance, X Shows Decline







Read next: Overtrust in AI Alters Decision-Making, Raising Concerns for Military Applications
by Arooj Ahmed via Digital Information World

South Korea Confirms DeepSeek Was Banned After It Sent User Data To ByteDance

DeepSeek may have made a lot of heads turn after it released AI models that many felt were superior to OpenAI at a fraction of the cost. However, the fame is slowly dwindling down thanks to some questionable findings.

The latest nation to ban the Chinese Startup is South Korea which confirmed that the decision was taken after it sent user data to TikTok’s parent firm ByteDance. This news comes days after we saw the PIPC share that new downloads of the app were suspended after it failed to consider the agency’s rules on data protection.

The company did set up a legal team to probe the matter in South Korea where it acknowledged its neglected actions towards the country’s data laws. Now the question still arises about which data was sent and to what kind of extent.

Under this law from South Korea, explicit content is needed from users if the matter has to do with personal information given to third parties. DeepSeek was installed close to more than one million times before it was removed from various app stores this past weekend.

We’ve seen the data protection authority Garante also order a probe and block the chatbot after it could not defend the concerns of the regulator linked to privacy policies. So far, critics from China have long mentioned how the nation’s National Intelligence Law provides the government full access to all data it needs from companies in China if they’re investigating threats related to national security issues or major offenses.

The context of this law in China is nearly identical to how the US handles issues such as data protection. Many businesses in America will need to cooperate with the authorities if and when asked to do so.

Image: DIW-Aigen

Read next: Facebook No Longer Wants to Be Your Live Video Archive, Store Your Content Elsewhere or Get Ready to Be Deleted
by Dr. Hura Anwar via Digital Information World

Facebook No Longer Wants to Be Your Live Video Archive, Store Your Content Elsewhere or Get Ready to Be Deleted

Social media giant Facebook just confirmed that it imposed a one-month deadline for all live videos on the app.

This means after the 30-day threshold is crossed, all live videos will cease to exist on the app and get deleted automatically. In the past, live videos were stored for an indefinite period but that won’t be happening anymore.

The latest policy comes into play starting today, while the initial video deletions will not take place for several months. Before any deletion takes place, the app promises to notify people through email. After that, they will get 90 days to either transfer the content or download it.

The company shared recently how many live video views arise during the initial weeks of them getting broadcasted. So clearing the material out will reduce service as well as storage expenses. However, the news is not being taken well by many whose old videos will end up getting wiped out after automated deletion processes begin.

The company is also helping people by providing user tools via the app’s interface for video downloads to either their computer or smartphone. They also have options to use cloud storage providers like Dropbox or Google Drive.

Users can download every video individually or as a group. After that, the footage can go live for an indefinite period as Facebook Reels. This is provided videos get edited to shorter clips that are 90 seconds long.

Users can predict to see these video deletions take place in wave patterns over the next few months. But those needing more time can avail of the app’s option to defer the removal for another six months. After that, if no choice is made, all old live videos will be deleted and no longer available to retain.

Image: FB

Read next: The Demand for Premium Segment Smartphones is Increasing, with Apple Dominating the Market Share
by Dr. Hura Anwar via Digital Information World

Tuesday, February 18, 2025

The Demand for Premium Segment Smartphones is Increasing, with Apple Dominating the Market Share

According to new data from Counterpoint Research, 25% of the smartphones which were shipped in 2024 had average wholesale price of $600 or more and this shows that the smartphone market is doing well right now. People are also willing to buy expensive smartphones, no matter their price, with the market share of premium smartphones rising from 15% in 2020 to 25% in 2024. The smartphones which are leading the premium segment devices are of Apples, with the market share of 67%. The second highest share of the premium segment in the smartphone market is Samsung, followed by Huawei, Xiaomi and Google.

The market share of the ultra-premium segment (with average wholesale price of more than $1,000) also increased 40% because people are also slowly wanting to buy extra premium smartphones. Apple is also the top smartphone brand in the ultra-premium segment with the average selling price of more than $900. Some reports also noted that device makers are now prioritizing revenue on volume as the premium segment saw an 8% YoY growth. This growth is higher than the overall smartphone market, which was just 5% YoY.
Most of the premium segment smartphones were well received in the US with 25% of the market share, followed by 24% from China. The largest smartphone market by volume is India and there has been a five times increase in its volume since 2020. Most of the customers in India cannot afford premium smartphones, but some policies and trade-in offers make it easier for them to buy expensive smartphones.

As customers for premium smartphones are growing, their demand will continue to rise because of advantages like better displays, processors, high-quality cameras and AI features. To justify high prices of premium smartphones, device makers are also offering future-proof hardware and multi-year software support in those smartphones.
Premium smartphones accounted for 25% of 2024 shipments, with Apple leading the segment, followed by Samsung and Huawei.

Ultra-premium smartphone sales surged 40%, with Apple dominating, while premium segment demand grew 8% YoY globally.

Read next:

• LinkedIn Surpasses X, Instagram, and Facebook, Securing the Highest Revenue Among Social Media Platforms Globally

• Overtrust in AI Alters Decision-Making, Raising Concerns for Military Applications
by Arooj Ahmed via Digital Information World

Overtrust in AI Alters Decision-Making, Raising Concerns for Military Applications

According to a recent research published in Scientific Reports, most of the AI users are overly influenced by it, even though AI admits its limitations to them. For the study, 558 participants were asked to do two experiments and the results showed that people are blindly trusting AI especially in uncertain situations. One of the researchers, Colin Holbrook, said that it is a concerning situation and society should know the risks if they are overly dependent on AI, especially when the AI technology is still improving day by day.

The researchers designed the experiments which mimicked real life high pressure and uncertain real world military decisions. Participants were shown the images of innocent civilians first and then an image of drone strike after. Participants faced a zero-sum dilemma where if they failed to identify and eliminate enemies, it could result in civilians dying. Mistakenly targeting innocent civilians as enemies could also result in them killing innocent people. The participants were shown quick images with enemy or civilian symbols in 650 milliseconds and AI was assisting participants to identify the symbols in those images. Participants were given two opportunities to confirm or change their choices and AI was offering encouragement.

In the first experiment, researchers wanted to test whether the presence of a physical robot would influence the trust level more than a virtual one so in one scenario, participants were given a full-size, human-like android with 1.75 meters height. The results showed that the physical presence of the robot had little effect on how much participants trusted its advice. The second experiment was online with a larger group of participants and half of the participants interacted with a highly anthropomorphic virtual robot that had human-like behavior, while the other half interacted with a basic computer interface that only responded with texts. The results showed that even if the AI was basic, it had a significant influence on decision-making of participants.

The results of both experiences showed that participants changed their decisions based on random advice by AIs, with 58.3% changing their decisions in the first experience and 67.3% changing their decisions in the second experiment. Participants were correct 70% of the times initially but their accuracy dropped to 50% when they followed AI’s unreliable guidance.

When AI agreed with the initial decisions of participants, the participants felt 16% more confident but when the AI disagreed, participants felt a 9.48% drop in their confidence. The participants who felt that AI is smarter were more likely to trust its judgement. U.S Air Force is testing AI co-pilots so it is better to understand and address the risks of excessive reliance on AI, especially in military decisions.


Image: DIW-Aigen

Read next: As ChatGPT Evolves, Researchers Uncover Unforeseen Political Leanings in AI Models
by Arooj Ahmed via Digital Information World

Researchers Uncover YouTube’s True Scale as Google Withholds Platform Insights

If you're an avid fan of the YouTube app, you're well aware of how transparent and very public-facing it's proven to be over the years. Users get the chance to see content galore and research anything and everything under the sun.

However, there are times when one wonders why the video-sharing app doesn't like to detail too many statistics about its success. For instance, why does the app go quiet when asking simple queries like how much content viewers see. Interestingly, other things are so public like effects on the algorithm and the economy of today. The platform is quite silent in that regard considering it’s got more than 2.5 Billion users every month. That’s one in every three people on earth using it. Did we mention how the average user watches up to 29 hours of content each month?

When you do the math, it’s about 8.3 Million years of content seen on YouTube each month. Over the past year, this is the equivalent of 100 Million years which is a hundred times greater than the total of human history.

But wait, the curiosity does not end there. We want to know how many videos are actually there and what are they all about. Which languages do most YouTubers speak and beyond? Sadly, the app isn’t going to be upfront on that at first.

This is where the issue lies. Many feel YouTube might be operating a lot of things in the dark which users should be aware of. That’s partly because there’s no simple way to attain random video samples. You can either pick what the algorithm recommends or use the manual approach. So that means unbiased options that are worth real-time and study are difficult to attain.

Several years back, we saw teams of research experts come up with the best possible solution. This is designed to give rise to a new computer program that pulls up content in a random fashion. It tries billions of URLs at a single time. Some might refer to it as a bot but that’s going into extreme. Zuckerman feels it’s more accurate to refer to it as a scraper.

Surveys display that in the two decades of operations, YouTube remains at the top of the list of the most popular apps in America. Up to 83% of all adults and 93% of all teens are part of its user base. It’s also the second most popular website on the planet by estimates. Only Google managed to top it.

Now the platform has entered the third decade but still, it’s such a secret for many. One spokesperson mentioned through a blog post about the recommendation algorithm. It refused to comment on the stats and other problems highlighted above so the mystery does continue.

It’s hard to get an idea of what’s happening inside apps because while organizations operate them to make public disclosures, these are fragmentary and misleading. Google does not wish to tell others about how large and brilliant the platform is. They don’t want others to know about the figures for users and how great the content is. To be honest, it’s almost as if Google doesn’t wish to share the major influential stance it holds in people’s lives.

However, Zuckerman and his team of research experts are hard to beat. They want a program that can roll out random characters and quick checks depending on the corresponding video. Whenever a scraper finds one, it installs it. It’s all thanks to the fact that URLs on the app use classic formats. They were able to get a huge data set and the scraper had to go through nearly 18T potential URLs. Despite the large figure of bad guesses for each video found, the findings were finally analyzed.

Secret stats including the figure of videos users uploaded on the app. Google used to share such findings but not anymore. By the middle of 2024, the figure stood at 14.8 Billion videos which was a 60% rise than those seen in previous years.

While YouTube was created at the start to serve regular people, the company is more keen on serving professional creators than anyone else. The recent scraping project by Zuckerman’s lab proves that it’s actually less like television and more in tune with being an infrastructure.

Take a look at the charts below for more insights:




Key takeaways from above charts:

The first chart illustrates the distribution of estimated views per YouTube video, showing that most videos receive relatively few views. The highest frequency occurs in the 17-32 views range, with a peak around 10-11%. The majority of videos fall below 2,048 views, while only a tiny fraction surpasses millions.

The second chart demonstrates YouTube's rapid expansion, growing from under a billion videos in 2010 to over 14 billion by 2024. The increase has been particularly sharp since 2018, reflecting YouTube’s accelerating content production.

The third graphic highlights language distribution, with English dominating at nearly 30%, followed by Hindi (around 10%), Spanish, Portuguese, and Russian, each contributing approximately 5-10%. Other languages like Arabic, Japanese, and Bengali hold smaller shares, with diverse representation across global languages.

H/T: BBC

Read next: Meta Acknowledges Error Sent To Some Facebook Pages Which Asked Them To Confirm That Their Page Isn’t Aimed at Kids Under 13
by Dr. Hura Anwar via Digital Information World