LWN.net Weekly Edition for May 29, 2025
Welcome to the LWN.net Weekly Edition for May 29, 2025
This edition contains the following feature content:
- Glibc project revisits infrastructure security: what is the best way to keep the GNU C Library code secure?
- Cory Doctorow on how we lost the internet: a PyCon keynote on just how things went wrong.
- System-wide encrypted DNS: work that has been done to make encrypted DNS just work on enterprise distributions.
- Development statistics for the 6.15 kernel: a look at where the code for this release came from.
- Ongoing LSFMM+BPF 2025
coverage:
- Long-duration stress-testing for filesystems: a discussion on filesystem testing aimed at finding more bugs before they are discovered in a production setting.
- Formally verifying the BPF verifier: a look at using Agni to prove parts of the BPF verifier correct.
- Verifying the BPF verifier's path-exploration logic: Srinivas Narayana shares a plan to extend Agni to cover more of the verifier.
- Allowing BPF programs more access to the network: new functions to allow BPF programs to send data over the network, and cleanly disconnect TCP connections.
- Reports from OSPM 2025, day two: discussions on improvements to device suspend and resume, the status and future of sched_ext, the scx_lavd scheduler, improving the efficiency of load balancing, and hierarchical constant bandwidth server scheduling.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
Glibc project revisits infrastructure security
The GNU C Library (glibc) is the core C library for most Linux distributions, so it is a crucial part of the open-source ecosystem—and an attractive target for any attackers looking to carry out supply-chain attacks. With that being the case, securing the project's infrastructure using industry best practices and improving the security of its development practices are a frequent topic among glibc developers. A recent discussion suggests that improvements are not happening as quickly as some would like.
On May 9, glibc maintainer Carlos O'Donell wrote
to the libc-alpha mailing list to ask other glibc developers to review
a secure
software development life-cycle process document that he
had drafted for glibc. He also provided a similar top-level document for the
GNU toolchain that includes GNU Binutils, GCC, glibc, and the GNU Project Debugger (GDB). The
goal is to define "what we expect from the infrastructure,
developer end points, and our process
" in order to figure out what
services are needed to create a more secure development process.
The glibc project is hosted on Sourceware, which provides project
hosting for free-software toolchain
and developer tools, including those that comprise the GNU
Toolchain. O'Donell noted that some of the items in his document were
taken from Sourceware Cyber Security FAQ in its section "suggested
secure development policies for projects", but had been rearranged
into a structure that matched the NIST Secure Software
Development Framework, which is the standard he recommended
as "the simplest and least prescriptive
".
In a nutshell, the document suggests top-level practices to be
adopted "in order to develop a secure and robust GNU C
Library
". This includes treating infrastructure as a zero-trust
environment in which it is assumed that any of the services,
developers, administrators, or systems have been compromised and
attempting to limit the consequences of such a compromise. It carries
a host of recommendations such as defining security requirements for
developers, implementing security tooling, and separating services
into distinct systems or VMs.
Hosting
O'Donell emphasized that he was not talking about where to
host the project's infrastructure, though the document does discuss
hosting. This was worth noting, as the topic of toolchain
infrastructure and hosting has come up a number of times, almost as an
annual ritual at this point. It was, for example, raised
in 2023 by O'Donell, and again
in 2024, with O'Donell unsuccessfully trying to drive a core-toolchain-infrastructure
(CTI) project that would have moved some of glibc's core collaboration
services to infrastructure managed by the Linux Foundation. The statement
of work for CTI proposed moving glibc infrastructure to a cloud
vendor, adding multiple points of redundancy, as well as 24/7 monitoring
and engineering support to handle service outages or "high-risk
security events
". The annual running cost for CTI was estimated at
$276,000.
Sourceware became a member of the non-profit Software Freedom Conservancy (SFC) in 2023, in part a response to a push by O'Donell and others to move GNU Toolchain services to the Linux Foundation in 2022. The current costs of glibc's infrastructure are somewhat nebulous, as much of Sourceware's infrastructure is provided as donations rather than billed by a provider.
The infrastructure and services for Sourceware are managed by volunteers, with hardware, bandwidth, and hosting, and other services donated by number of individuals, companies, and other organizations. Red Hat provides the main server (singular) for several services, as well as a backup server should that one fail. The Oregon State University Open Source Lab, which recently had a funding scare, hosts the server that provides automated source and documentation snapshots. There are a number of machines provided and administrated by other organizations and individuals for building software on various architectures, such as arm64, RISC-V, and s390x.
Mark Wielaard, who serves on the Sourceware project leadership committee, posted a report on Sourceware's second year with the SFC on May 27. According to that report, Sourceware's total income over the last year was about $3,000 from personal donations, and it has spent about $240 on PayPal fees and spare disks for its servers. In total, it has a little more than $10,000 in the bank.
According to Sourceware's infrastructure
security page, the site hosts more than 25 projects that have more
than 350 active developers and 1,000 contributors. The page has a plans
section at the bottom with a list of high-level goals to improve
the security of Sourceware's processes and services. This includes
isolating services, modernizing account-management processes,
improving the release-upload process, and hiring a part-time junior
system administrator. The list of plans is unchanged since the page
was first
captured by the Internet Archive on May 28, 2024. Wielaard
noted in his report that Sourceware is looking for sponsors to help
"accelerate
" its security plans.
CTI
The CTI discussion in 2024 was contentious, with glibc maintainers
objecting to both the way the proposal was developed and the choice of
Linux Foundation services. Zoë Kooyman weighed
in on behalf of the Free Software Foundation (FSF) to say that it
opposed the effort to move glibc to CTI. She noted that the proposal
would mean that only Linux Foundation IT staff would have
administrative access to the servers for CTI, thus no one outside the
foundation would be able to improve, maintain, or audit the
infrastructure. Sourceware, on the other hand, "accepts
technical contributions, and LF IT could be making them right
now
".
Andrew Pinski asked
why the proposal was not developed on the glibc development list, and
said that it "gives the vibes of being too closed and being done in
a rush
" without thinking the proposal through. Alfred M. Szmidt complained
that it smelled like a corporate push and was not something the
community wanted. Wielaard questioned
why O'Donell was "pushing for something that was already highly
controversial
" and received negatively when it had been proposed
before:
I thought we had consensus that the community wasn't really helped by setting up a corporate controlled directed fund or by having a highly disruptive change of infrastructure providers. [...]
Personally, as a glibc developer, I don't want a messy migration of some of the services separating glibc from the rest of the core toolchain and developer tool projects hosted at Sourceware. And looking at some of the other replies I think there is sustained opposition to this idea.
That opposition has not abated. Pinski said
of the new proposal that the glibc document was less about
security and more about pushing glibc toward the CTI project. He said
that it would be better to step back and discuss glibc's model of
submitting patches and approvals. Wielaard thought
that the proposed policy would be better and clearer if it
concentrated solely on the secure-development process. "We have
better/separate documents for the hosting infrastructure security
parts.
"
Isolation
Joseph Myers, however, worried
about Sourceware running many services that were not isolated to
separate virtual machines or containers. That may have been fine 25
years ago, but the project should assume now that it is "at risk of
targeted attacks from state-level actors
". Its practices were
outdated ten years ago, he said, and certainly outdated when he raised
similar concerns during GNU Cauldron in
2022. That was likely a reference to a Birds-of-a-Feather
session on Sourceware's toolchain infrastructure that included a
presentation
by O'Donell and David Edelsohn about using managed services from the
Linux Foundation. LWN covered this session,
which was "loud, contentious, and bordered on physical violence at one point
".
In 2022, Myers floated the idea that Sourceware administrators could
move to "a modern high-security setup with isolated services
",
so that compromises of one project or service would not impact other
projects or services. He said he had not seen much progress on
isolation since 2022, though there had been a few security
improvements, such as disabling inactive user accounts:
If Sourceware doesn't do such a migration to more secure, isolated hosting of services (within a reasonable time starting from 2022), that also serves as evidence as far as I'm concerned of the advantages of different hosting arrangements. If in fact lots of such migrations have happened since the 2022 Cauldron and are continuing to happen for the last few unisolated services, that serves as evidence that Sourceware provides suitable hosting arrangements but needs to work on improving how configuration changes and administrative actions get reported to the hosted projects.
Wielaard said that the Sourceware organization was working on it, though progress might not be as fast as Myers might like. Sourceware had started isolating processes using systemd services and resource controls, and there would be an opportunity to move to separate containers or VMs in Q3 when Red Hat's "community cage" servers move to a datacenter in Raleigh, NC. (An update on this move, specific to Fedora's services, was posted in April by Kevin Fenzi on the Fedora Community Blog.)
Security check list
While much of the conversation focused on the project's hosting
infrastructure, there was some discussion of the other elements of
O'Donell's document. Wielaard questioned
whether the NIST format was the right one. It contains useful
elements, he said, but "in general it isn't
really a good way for a community project to document its cyber
security practices
". He added that the topic of a "secure
development policy champion
" had come up during the Sourceware
office hours the day before.
O'Donell replied
and volunteered to be glibc's secure-development policy champion. He
disagreed that the NIST framework was not suitable for glibc, and
pointed to a document
he had created that compared Sourceware's cybersecurity policy to
NIST's framework. His analysis concludes that items in Sourceware's
checklist "do not clearly flow from any top-level requirements for
security e.g. why would I do this particular step?
", and
recommends that the checklist should be rewritten to match NIST's
framework.
Wielaard said
he appreciated that there were interesting points from NIST, but a
free-software project is unlike the organizations described in its
document. "Pretending it is distracts from the strengths of
collaboratively working together on Free Software.
" He added that
Sourceware had been mostly looking at the European Union Cyber
Resilience Act (CRA), and the checklist aimed to help create a
documented, verifiable project security policy to prepare for the CRA
becoming law. He said that it was great that O'Donell was volunteering:
the best way forward would be to go over the checklist to document
things that are already implemented or how to adopt any item that
glibc is not already doing:
At the meeting several people said that we shouldn't mandate any specific policy item, but that we should look at making it attractive for contributors to follow policies because they agree it is good for the project as a whole. At the moment only the retiring of inactive accounts is mandated.
One suggestion was to use some kind of gamification between projects to see who did most. e.g. each quarter we publish a "signed-commit census report". We could turn that into a kind of leaderboard by sorting the projects by number of signed commits or number of people pushing signed commits. Last quarter glibc had just 8% of signed commits, that percentage could certainly be higher for Q2!
With that, the conversation seems to have sputtered out for now. The matter of glibc process security will, no doubt, come up again in the future. The project, and Sourceware, do seem to be inching toward better security practices and more secure infrastructure. However, the current status is less than comforting given the importance of glibc and the overall GNU Toolchain. Given the history of attacks on free-software projects (like last year's XZ backdoor) and infrastructure, one might expect a little more urgency (and industry support) in seeing to those improvements.
Cory Doctorow on how we lost the internet
Cory Doctorow wears many hats: digital activist, science-fiction author, journalist, and more. He has also written many books, both fiction and non-fiction, runs the Pluralistic blog, is a visiting professor, and is an advisor to the Electronic Frontier Foundation (EFF); his Chokepoint Capitalism co-author, Rebecca Giblin, gave a 2023 keynote in Australia that we covered. Doctorow gave a rousing keynote on the state of the "enshitternet"—today's internet—to kick off the recently held PyCon US 2025 in Pittsburgh, Pennsylvania.
He began by noting that he is known for coining the term
"enshittification" about the decay of tech platforms, so attendees were
probably expecting to hear about that; instead, he wanted to start by
talking about nursing. A recent
study described how nurses are increasingly getting work through one of
three main apps that "bill themselves out as 'Uber for nursing'
".
The nurses never know what they will be paid per hour prior to accepting a
shift and the three companies act as a cartel in order to "play all
kinds of games with the way that labor is priced
".
In particular, the
companies purchase financial information from a data broker before offering
a nurse a shift; if the nurse is carrying a lot of credit-card debt,
especially if some of that is delinquent, the amount offered is
reduced. "Because, the more desperate you are, the less you'll accept to
come into work and do that grunt work of caring for the sick, the elderly,
and the dying.
" That is horrific on many levels, he said, but "it
is emblematic of 'enshittification'
", which is one of the reasons he
highlighted it.
Platform decay
Enshittification is a three-stage process; he used Google to
illustrate the idea. At first, Google minimized ads and maximized spending
on engineering to produce a great search engine; while it was doing that,
however, it was buying its way to dominance. "They bribed every service,
every product
that had a search box to make sure that that was a Google search box.
"
No matter which browser, phone carrier, or operating system you were using,
Google ensured that you were using its search by default; by the early
2020s, it was spending the equivalent of buying a Twitter every 18 months
to do so, he said. That is the first stage of the process: when the
provider is being good to its users, but is finding ways to lock them in.
![Cory Doctorow [Cory Doctorow]](png/pycon-doctorow-sm.png)
The second phase occurs once the company recognizes that it has users
locked in, so it will be difficult for them to switch away, and it shifts
to making things worse for its users in order to enrich its business
customers. For Google, those are the publishers and advertisers. A
growing portion of the search results page is shifted over to ads
"marked off with ever-subtler, ever-smaller, ever-grayer labels
distinguishing them from the organic search results
". While the
platform is getting better for business customers—at the expense of the
users—those customers are also getting locked in.
Phase three of enshittification is when the value of the platform is
clawed back until all that is left is kind of a "homeopathic residue—the
least value needed to keep both business customers and end users locked to
the platform
". We have gained a view into this process from the three
monopoly cases that Google has lost over the last 18 months. In 2019, the
company had 90% of the world's search traffic and its users were loyal;
"everyone who searched on Google, searched everything on Google
".
But that meant that Google's search growth had plateaued, so how was the
company going to be able to grow? It could "raise a billion humans to
adulthood and make them Google customers, which is Google Classroom, but that's a
slow process
". From the internal memos that came to light from the
court cases, we can see what the company chose to do, he said: "they
made search worse
".
The accuracy of the search results was reduced, which meant that users
needed to do two or three queries to the get the results they would have
seen on the first page. That increased the number of ads that could be
shown, which is obviously bad for searchers, but the company was also
attacking its business customers at the same time. For example, "Google entered into
an illegal, collusive arrangement with Meta, called Jedi Blue
" that
"gamed the advertising market
" so that publishers got paid less and
advertisers had to pay more, he said.
So that's how we have ended up at the Google of today, where the top of the
search results page is "a mountain of AI slop
", followed by five
paid results "marked with the word 'Ad' in eight point, 90%
gray-on-white type
", ending with "ten spammy SEO [search-engine
optimization] links from someone else who's figured out how to game
Google
". The amazing thing is "that we are still using Google
because we're locked into it
". It is a perfect example of the result
of the "tragedy in three acts
" that is enshittification.
Twiddling
The underlying technical means that allows this enshittification is
something he calls "twiddling". Because the companies run their apps on
computers, they can change a nearly infinite number of knobs to potentially
alter "the prices, the cost, the search rankings, the
recommendations
" each time the platform is visited. Going back to the
nursing example, "that's just twiddling, it's something you can only do
with computers
".
Legal scholar Veena Dubal coined the term "algorithmic
wage discrimination" to describe this kind of twiddling for the "gig
economy", which is "a major locus for enshittification
"; the nursing
apps, Uber, and others are examples of that economy. "Gig work is that
place where your shitty boss is a shitty app and you're not allowed to call
yourself an employee.
"
Uber invented a particular form of algorithmic wage discrimination; if its
drivers are picky about which rides they accept, Uber will slowly raise the
rates to entice those drivers—until they start accepting rides. Once a
driver does accept a ride, "the wage starts to push down and down at
random intervals in increments that are too small for human beings to
readily notice
". It is not really "boiling the frog
", Doctorow
said, so much as it is "slowly poaching it
".
As anyone with a technical background knows, "any task that is simple,
but time-consuming is a prime candidate for automation
". This
kind of "wage theft
" would be tedious and expensive to do by hand,
but it is trivial to play these games using computers. This kind of thing
is not just bad for nurses, he said, its bad for those who are using their
services.
Do you really think that paying nurses based on how desperate they are, at a rate calculated to increase their desperation so that they'll accept ever-lower wages, is going to result in us getting the best care when we see a nurse? Do you really want your catheter inserted by a nurse on food stamps who drove an Uber until midnight the night before and skipped breakfast this morning so that they could pay the rent?
Paying and products
It is misguided to say "if you're not paying for the product, you're the
product
", because it makes it seem like we are complicit in sustaining
surveillance
capitalism—and we are not. The thinking goes that if we were only
willing to start paying for things, "we could restore capitalism to its
functional non-surveillance state and companies would treat us better
because we'd be customers and not products
". That thinking elevates
companies like Apple as "virtuous alternatives
" because the company
charges money and not attention, so it can focus on improving the
experience for its customers.
There is a small sliver of truth there, he said; Apple rolled out a feature
on its phones that allowed users to opt-out of third-party
surveillance—notably Facebook tracking. 96% of users opted out, he said;
the other 4% "were either drunk or Facebook employees or drunk Facebook
employees
".
So that makes it seem like Apple will not treat its customers as products,
but at the same time it added the opt-out, the company secretly started gathering
exactly the same information for its "own surveillance
advertising network
". There was no notice given to users and no way to
opt out of that surveillance; when journalists discovered it and published
their findings, Apple "lied about it
". The "$1000 Apple
distraction rectangle in your pocket is something you paid for
", but
that does not stop Apple from "treating you like the product
".
It is not just end users that Apple treats like products; the app vendors
are also treated that way with 30% fees for payment processing in the App
Store. That's what is happening with gig-app nurses: "the nurses are the
product, the patients are the product, the hospitals are the product—in
enshittification, the product is anyone you can productize
".
While it is tempting to blame tech, Doctorow said, these companies did not
start out enshittified. He recounted the "magic
" when Google debuted;
"you could ask
Jeeves questions for a thousand years and still not get an answer as
crisp, as useful, as helpful as the answer you would get by typing a few
vague keywords
" into Google. Those companies spent decades producing
great products, which is why people switched to Google, bought iPhones, and
joined their friends on Facebook. They were all born digital, thus could
have enshittified at any time, "but they didn't, until they did, and
then they did it all at once
".
He believes that changes to the policy environment is what has led to
enshittification, not changes in technology. These changes to the rules of
the game were "undertaken in living memory by named parties who were
warned at the time of the likely outcomes
"—and did it anyway.
Those people are now extremely rich and respected; they have "faced no
consequences, no accountability for their role in ushering in the
Enshittocene
". We have created a perfect breeding ground for the worst
practices in our society, which allowed them to thrive and dominate
decision-making for companies and governments "leading to a vast
enshittening of everything
".
That is a dismal outlook, he said, but there is a bit of good news hidden
in there. This change did not come about because of a new kind of evil
person or the weight of history, but rather because of specific policy
choices that were made—and can be unmade. We can consign the enshitternet
to the scrap heap as
simply "a transitional state from the old good internet that we used to
have and the new good internet that we could have
".
All companies want to maximize profits and the equation to do so is simple:
charge as much as you can, pay suppliers and workers as little as you can,
and spend the smallest amount possible on quality and safety. The
theoretically "perfect" company that charges infinity and spends nothing
fails because no one wants to work for it—or buy anything from it. That
shows that there are external constraints that tend to tamp down the
"impulse to charge infinity and deliver nothing
".
Four constraints
In technology, there are four constraints that help make companies better; they help push back against the impulse to enshittify. The first is markets; businesses that charge more and deliver less lose customers, all else being equal. This is the bedrock idea behind capitalism and it is also the basis of antitrust law, but the rules on antitrust have changed since the Sherman Antitrust Act was enacted in 1890. More than forty years ago, during the Reagan administration in the US, the interpretation of what it means to be a monopoly was changed, not just in US, but also with its major trading partners in the UK, EU, and Asia.
Under this interpretation, monopolies are assumed to be efficient; if
Google has 90% of the market, it means that it deserves to be there because
no one can possibly do search any better. No competitor has arisen because
there is no room to improve on what Google is doing. This pro-monopoly
stance did exactly what might be expected, he said, it gave us more
monopolies: "in pharma, in beer, in glass bottles, vitamin C, athletic
shoes, microchips, cars, mattresses, eyeglasses, and, of course,
professional wrestling
", he said to laughter.
Markets do not constrain technology firms because those firms do not compete
with their rivals—they simply buy their rivals instead. That is confirmed
by a memo from Mark Zuckerberg—"a man who puts all of his dumbest ideas
in writing
"—who wrote: "It is better to buy than to compete
".
Even though that anti-competitive behavior came to light before Facebook
was allowed to buy Instagram in order to ensure that users switching would
still be part of Facebook the platform, the Obama administration
permitted the sale. Every government over the past 40 years, of all political stripes, has treated monopolies as efficient,
Doctorow said.
Regulation is also a constraint, unless the regulators have already been
captured by the industry they are supposed to oversee. There are several
examples of regulatory
capture in the nursing saga, but the most egregious is that anyone in
the US can obtain financial information on anyone else in the country,
simply by contacting a data broker. "This is because the US congress
has not passed a new consumer privacy law since 1988.
" The Video
Privacy Protection Act was aimed at stopping video-store clerks from
telling newspapers what VHS video titles were purchased or rented, but no
protections have been added since then.
The reason congress has not addressed privacy legislation "since Die
Hard was in its first run in theaters
" is neither a coincidence
nor an oversight, he said. It is "expensively purchased inaction
"
by an industry that has "monetized the abuse of human rights at
unimaginable scale
". The coalition in favor of freezing privacy law
keeps growing because there are so many ways to "transmute the
systematic invasion of our privacy into cash
".
Tech companies are not being constrained by either markets or governments,
but there are two other factors that could serve to tamp down "the
reproduction of sociopathic, enshittifying monsters
" within these
companies. The first is interoperability; in the non-digital world, it is
a lot of work to, say, ensure that any light bulb can be used with any
light socket.
In the digital world, all of our programs run on the same
"Turing-complete, universal Von Neumann machine
", so a program that
breaks interoperability can be undone with a program that restores it.
Every ten-foot fence can be surmounted with an 11-foot ladder; if HP writes
a program to ensure that third-party ink cannot be used with its printers, someone
can write a program to undo that restriction.
DoorDash workers generally make their money on tips, but the app hides the
amount of the tip until the driver commits to taking the delivery. A
company called Para wrote a program that looked inside the JSON that was
exchanged to find the tip, which it then displayed before the driver
had to commit. DoorDash shut down the Para app, "because in America,
apps like Para are illegal
". The 1998 Digital
Millennium Copyright Act (DMCA) signed by Bill Clinton "makes it a
felony to 'bypass an access control for a copyrighted work'
". So even
just reverse-engineering the DoorDash app is a potential felony, which is
why companies are so desperate to move their users to apps instead of web
sites. "An app is just a web site that we have wrapped in a correct
DRM [digital
rights management] to make it a felony to protect your privacy while
you use it
", he said to widespread applause.
At the behest of the US trade representative, Europe and Canada have also
enacted DMCA-like laws. This happened despite experts warning the leaders
of those countries that "laws that banned tampering with digital locks
would let American tech giants corner digital markets in their
countries
". The laws were a gift to monopolists and allowed companies
like HP to continually raise the price of ink until it "has become the
most expensive substance you, as a civilian, can buy without a permit
";
printing a shopping list uses "colored water that costs more than the
semen of a Kentucky-Derby-winning
stallion
".
The final constraint, which did hold back platform decay for quite some
time, is labor. Tech workers have historically been respected and
well-paid, without unions. The power of tech workers did not come from
solidarity, but from scarcity, Doctorow said. The minute bosses ordered
tech workers to enshittify the product they were loyally working on,
perhaps missing various important social and family events to
ship it on time, those workers could say no—perhaps in a much more coarse
way. Tech workers could simply walk across the street "and have a new
job by the end of the day
" if the boss persisted.
So labor held off enshittification after competition, regulation, and interoperability were all systematically undermined and did so for quite some time—until the mass tech layoffs. There have been half a million tech workers laid off since 2023, more are announced regularly, sometimes in conjunction with raises for executive salaries and bonuses. Now, workers cannot turn their bosses down because there are ten others out there just waiting to take their job.
Reversing course
Until we fix the environment we find ourselves in, the contagion will
spread to other companies, he said. The good news is that after 40 years
of antitrust decline, there has been a lot of worldwide antitrust activity
and it is coming from all over the political spectrum. The EU, UK,
Australia, Germany, France, Japan, South Korea, "and China, yes,
China
" have passed new antitrust laws and launched enforcement actions.
The countries often collaborate, so a UK study on Apple's 30%
payment-processing fee was used by the EU to fine the company for billions
of euros and ban Apple's payment monopoly; those cases then found their way
to Japan and South Korea where Apple was further punished.
"There are no billionaires funding the project to make billionaires
obsolete
", Doctorow said, so the antitrust work has come from and been
funded by
grassroots efforts.
Europe and Canada have passed strong right-to-repair legislation, but those
efforts "have been hamstrung by the anti-circumvention laws
" (like
the DMCA). Those laws can only be used if there are no locks to get
around, but the manufacturers ensure that every car, tractor, appliance,
medical implant, and hospital medical device has locks to prevent repair.
That raises the question of why these countries don't repeal their versions
of the DMCA.
The answer is tariffs, it seems. The US trade representative has long
threatened countries with tariffs if they did not have such a law on their
books. "Happy 'Liberation Day' everyone
", he said with a smile,
which resulted in laughter, cheering, and applause. The response of most
countries when faced with the US tariffs (or threats thereof) has been to
impose retaliatory tariffs, making US products more expensive for their
citizens, which is a weird way to punish Americans. "It's like punching
yourself in the face really hard and hoping someone else says 'ouch'.
"
What would be better is for the countries to break the monopolies of the US
tech giants by making it legal to reverse-engineer, jailbreak, and modify
American products and services. Let companies jailbreak Teslas and deliver
all of the features that ship in the cars, but are disabled by software,
for one price; that is a much better way to hurt Elon Musk, rather than by
expressing outrage at his Nazi salutes, since he loves the
attention. "Kick him in the dongle.
"
Or, let
a Canadian company set up an App Store that only charges 3% for payment
processing, which will give any content producer an immediate 25% raise, so
publishers will flock to it. The same could be done for car and tractor
diagnostic devices and more.
"Any country in the world has it right now in their power to become a
tech-export powerhouse.
"
Doing so would directly attack the tech giants in their most profitable
lines of business: "it takes the revenues
from those rip-off scams globally from hundreds of billions of dollars to
zero overnight
". And "that is how you win a trade war
", he said
to more applause.
He finished with a veritable laundry list of all of the ills facing the
world today (the "omni-shambolic poly-crisis
"), both on and off the
internet, and noted that the tech giants
would willingly "trade a habitable planet and human rights for a 3% tax
cut
". But it did not have to be this way, "the enshitternet was not
inevitable
" and was, in fact, the product of policy choices made by
known people in the last few decades. "They chose enshittification; we
warned them what would come of it and we don't have to be eternal prisoners
of the catastrophic policy blunders of clueless lawmakers of old.
"
There once was an "old good internet
", Doctorow said, but it was
too difficult for non-technical people to connect up to; web 2.0 changed
that, making it easy for everyone to get online, but that led directly into
hard-to-escape walled gardens. A new good internet is possible and needed; "we can
build it with all of the technological self-determination of the old good
internet and the ease of web 2.0
". It can be a place to come together
and organize in order to "resist and survive climate collapse, fascism,
genocide, and authoritarianism
". He concluded: "we can build it and
we must
".
His speech was well-received and was met with a standing ovation. Some of his harshest rhetoric (much of which was toned down here) may not have been popular with everyone, perhaps especially the PyCon sponsors who were named and shamed in the keynote, but it did seem to resonate within the crowd of attendees. Doctorow's perspective is always interesting—and he certainly pulls no punches.
A YouTube video of the talk is available.
[I would like to thank LWN's travel sponsor, the Linux Foundation, for supporting my travel to Pittsburgh for PyCon.]
System-wide encrypted DNS
The increasing sophistication of attackers has organizations realizing that perimeter-based security models are inadequate. Many are planning to transition their internal networks to a zero-trust architecture. This requires every communication on the network to be encrypted, authenticated, and authorized. This can be achieved in applications and services by using modern communication protocols. However, the world still depends on Domain Name System (DNS) services where encryption, while possible, is far from being the industry standard. To address this we, as part of a working group at Red Hat, worked on fully integrating encrypted DNS for Linux systems—not only while the system is running but also during the installation and boot process, including support for a custom certificate chain in the initial ramdisk. This integration is now available in CentOS Stream 9, 10, and the upcoming Fedora 43 release.
Zero-trust architecture
A common perimeter-based approach separates the network into two sectors—internal and external. While the external network is usually not trusted, there is an implicit trust in the internal network. Even though it is quite common to authenticate users to the services, it is expected that any host and communication inside the internal network is trustworthy; therefore there is no mutual authentication or the need for data encryption.
There is an increased risk of cyberattacks every year, and the designation of "internal" and "external" for network-connected devices is much less useful today. Companies are moving resources from internal networks into public clouds, and employees are working remotely or on devices not owned by the enterprise thanks to "bring your own device" policies. Implicit trust in "internal" networks is no longer acceptable, if it ever was.
Over the years, new extensions have been added to the DNS protocol to enhance its security. Domain Name System Security Extensions (DNSSEC) adds verification and data integrity. DNS over TLS (DoT) talks to the server over an encrypted channel. DNS over HTTPS (DoH) allows tunneling of DNS queries over HTTPS, and DNS over QUIC (DoQ) implements encryption on top of UDP.
Even though technology to implement DNS in zero-trust networks exists, it has not been widely adopted. And while it is possible for Linux users to manually configure encrypted DNS on their machine, there is no integration into the system. Multiple DNS lookups are usually performed while doing a Linux installation, and it is possible to boot from remote sources, which requires working DNS as well. This poses the question: how do you install and boot the operating system in a zero trust environment? The answer is to integrate encrypted DNS into the system.
System-wide encrypted DNS
There are two methods that applications typically use to talk to a DNS server: the system resolver using the POSIX API (getaddrinfo()) or by talking to a DNS server directly through a resolver library. When using the POSIX API the system resolver talks to the server that is configured in /etc/resolv.conf, however it does not support fully encrypted DNS. If an application implements its own resolver, it is often possible to configure a custom address for the DNS server (it typically defaults to contents of /etc/resolv.conf as well). Then it depends on the application whether encryption is supported—more often than not, encryption is not supported.
To avoid implementing encryption in all existing applications, it is possible to implement a local caching DNS resolver that can serve all local queries by forwarding them to the upstream DNS servers. This allows applications to talk to the local resolver using standard unencrypted UDP port 53, while the local resolver establishes an encrypted connection with the upstream DNS server and forwards all external communication over an encrypted channel. This local DNS resolver can be put into /etc/resolv.conf to let it be used automatically. This is demonstrated by the following figure:
Technology choices
Multiple components were considered to play the role of the local DNS caching resolver. The most promising were systemd-resolved and Unbound. The clear benefits of systemd-resolved were its existing integration within Fedora and NetworkManager. However, at the time of the decision, systemd-resolved had multiple longstanding issues that the upstream was not planning to fix, especially in the DNSSEC area, as shown in the systemd GitHub issues 24827, 23622, and 19227. Systemd-resolved also remains in technology preview in Red Hat Enterprise Linux (RHEL), which was our target distribution. After consulting with Red Hat's systemd and DNS developers, we chose Unbound as a small and reliable DNS caching resolver with good support for DNSSEC. Please note that some of the systemd-resolved issues were eventually fixed.
The choice of the communication protocol was more straightforward. DoT was selected for forwarding queries to the upstream DNS server, in favor of DoH or DoQ. Although the preferred solution would support all three protocols and let the user choose, the reality is that, while there is substantial support for downstream (receiving queries) DoT and DoH in DNS servers, only DoT is usually supported for upstream (forwarding) queries. Support for DoQ is not yet widely available on either side.
Integration
All users can manually configure the local caching DNS resolver as described above. However, changes to multiple components were required to fully integrate Unbound into the system and simplify the configuration and enforcing of encrypted DNS starting with the boot and installation processes, as well as in the running system.
The integration is centered around NetworkManager, which is the network-configuration daemon used by many distributions. From a user perspective, the main use case of NetworkManager is to obtain network information from DHCP or its configuration files and to set up the system networking properly.
Beniamino Galvani added (merge requests 2090 and 2123) new configuration and kernel options to set a static DNS server with DoT support to be used exclusively for all connections. NetworkManager already had built-in support for dnsmasq and systemd-resolved, but it did not have support for Unbound. A new plugin, dnsconfd, was added by Tomáš Korbař to handle the configuration of Unbound.
Dnsconfd is a new project that was created to sit between NetworkManager and the DNS caching resolver. It allows NetworkManager to focus on obtaining the list of upstream DNS servers instead of dealing with the configuration peculiarities of various local DNS services. It provides a generic D-Bus interface and translates calls to the interface into configuration of specific DNS resolvers. At this time, only Unbound is supported, but there is a plan to extend it for other resolvers as well.
To properly integrate encrypted DNS in the boot process, NetworkManager, dnsconfd, and OpenSSL must be included in and started from the initramfs image. Various distributions use different tools to create the image. We focused on dracut, which is used to build the initramfs image in Fedora and other distributions in the Red Hat family. Dracut has a modular architecture, where each module specifies which files are pulled into the image. NetworkManager already has its own dracut module that executes nm-initrd-generator to generate the network configuration, but it now supports the new NetworkManager options to enable the encrypted DNS. Further, Korbař and Pavel Valena implemented new dracut modules for OpenSSL and dnsconfd.
The last piece of the puzzle is to enable encrypted DNS during system installation (and of course in the installed system). Fedora and related distributions use the Anaconda installer, so we focused on this project. Since many DNS servers require a custom certificate chain to verify their TLS certificate, it is important to include this CA bundle in the installation process. For this, the Anaconda team implemented a new %certificate kickstart section that copies custom certificates during the installation.
Anaconda runs inside its own initramfs image where dracut modules generate the necessary configuration to enable DoT during the installation process. The installer then makes sure that all required services are started and copies all required configuration into the installed system. This makes sure that DoT can be used during installation, during boot, and in the freshly installed system and no unencrypted DNS query leaves the host—ever.
Encrypted DNS in identity management
FreeIPA ("identity, policy, audit") is an identity management solution—used widely with RHEL-type systems—that provides centralized authentication, authorization, and account information. It has introduced support for encrypted DNS via DoT in its integrated DNS service.
A typical FreeIPA deployment consists of one or more servers, optional replicas, and multiple clients. Servers act as the authoritative source of identity data and policies, while replicas provide scalability and redundancy. Both servers and replicas may optionally deploy the integrated DNS service, which allows them to manage DNS zones used in the identity infrastructure. Clients join the domain and interact with the servers for authentication, host enrollment, and service discovery.
In this topology, the golden rule for DNS security is clear: all DNS traffic leaving the host must be encrypted. This means clients must communicate with the DNS server over an encrypted channel (via DoT). Within the host, DNS queries may remain unencrypted as long as they occur over the loopback interface.
When a FreeIPA replica includes the integrated DNS service, it is treated similarly to a server, handling both incoming unencrypted queries from localhost and external encrypted queries. Replicas without the DNS service follow the client pattern: using Unbound as a local DoT resolver forwarding to an upstream encrypted DNS source. This distinction ensures consistent policy enforcement while accommodating different deployment needs as Triviño wrote in the design page.
The integration of DoT into FreeIPA is deliberately minimal in its first iteration, targeting new deployments and isolating the encrypted DNS logic into dedicated subpackages: freeipa-client-encrypted-dns and freeipa-server-encrypted-dns. This modular design ensures that existing installations remain unaffected when FreeIPA is upgraded, unless the user explicitly installs the new packages as implemented by Antonio Torres.
To implement DoT support, FreeIPA relies on Unbound as a local resolver and forwarder, sitting alongside the existing BIND 9.18-based DNS service. This architectural decision stems from current limitations in BIND's DoT forwarding capabilities, which are only addressed in the BIND 9.20 LTS release. The 9.20 release is not yet supported by FreeIPA due to the large number of architectural changes in BIND that requires a significant rewrite of bind-dyndb-ldap (a FreeIPA plugin that reads DNS zones from LDAP).
The integration of Unbound ensures encrypted external DNS queries while allowing BIND to continue handling internal DNS zone management and resolution. On the client side, Unbound is deployed as a local caching resolver. For servers and replicas, BIND handles internal and incoming DNS queries, both encrypted and unencrypted, while forwarding external requests through Unbound using TLS. See the image below for an illustration of this.
Certificate management is handled through FreeIPA's existing public-key infrastructure. Administrators can either provide their own TLS certificates or allow FreeIPA to issue and manage them via its Custodia subsystem. This flexibility enables integration into both enterprise-managed and automated deployments.
We have provided instructions for enabling system-wide encrypted DNS and FreeIPA's encrypted DNS feature on Fedora/RHEL-like systems as a separate guide.
Upstream and downstream availability
All of the work has been already upstreamed in Anaconda, dnsconfd, dracut, FreeIPA, NetworkManager, and the System Security Services Daemon (SSSD). It is released as part of the latest version of all affected components. Fedora users may already start experimenting with system-wide encrypted DNS in Fedora 42 (run time and boot time) and Fedora 43 (current rawhide, including encrypted DNS during installation). RHEL users will see the feature as part of 9.6 and 10.0 when they are released, or it can be used now in CentOS Stream 9 and 10.
Our working group continues to expand the encrypted DNS feature. The work that has been done so far was focused on enabling encrypted DNS for zero trust and enterprise requirements. One of the things on the road map is to implement support for DoH forwarding and also RFC 9463 "DHCP and Router Advertisement Options for the Discovery of Network-designated Resolvers" which allows the discovery of DoT or DoH servers from DHCP. We are also working on bind-dyndb-ldap rewrite to make it compatible with BIND 9.20 so it is possible to use BIND directly as the DoT forwarder and avoid running Unbound on the IPA server.
Development statistics for the 6.15 kernel
The 6.14 kernel development cycle only brought in 11,003 non-merge changesets, making it the slowest cycle since 4.0, which was released in 2015. The 6.15 kernel, instead, brought in 14,612 changesets, making it the busiest release since 6.7, released at the beginning of 2024. The kernel development process, in other words, is back up to full speed. The 6.15 release happened on May 25, so the time has come for the obligatory look at where the changes in this release came from.As a reminder, LWN subscribers can find this information and more, at any time, for any kernel version since 2005, in the LWN Kernel Source Database.
The work in 6.15 was contributed by 2,068 developers — a relatively high number, though it falls short of the record 2,090 seen in the 6.2 development cycle. There were 262 developers who made their first kernel contribution in 6.15. The most active contributors this time around were:
Most active 6.15 developers
By changesets Kent Overstreet 266 1.8% Kuninori Morimoto 191 1.3% Ville Syrjälä 144 1.0% Andy Shevchenko 137 0.9% Alex Deucher 123 0.8% Nam Cao 123 0.8% Sean Christopherson 117 0.8% Krzysztof Kozlowski 115 0.8% Takashi Iwai 114 0.8% Dr. David Alan Gilbert 111 0.8% Thomas Weißschuh 108 0.7% Jani Nikula 106 0.7% Pavel Begunkov 102 0.7% Jakub Kicinski 94 0.6% Eric Biggers 93 0.6% Christoph Hellwig 92 0.6% Arnd Bergmann 91 0.6% Matthew Wilcox 89 0.6% Ian Rogers 89 0.6% Mario Limonciello 87 0.6%
By changed lines Wayne Lin 80287 9.5% Ian Rogers 33886 4.0% Miri Korenblit 29176 3.4% Bitterblue Smith 26801 3.2% Andrew Donnellan 25819 3.0% Edward Cree 12941 1.5% Austin Zheng 12889 1.5% Michael Ellerman 12629 1.5% Dikshita Agarwal 8901 1.1% Nick Chan 8802 1.0% Nick Terrell 8749 1.0% Kent Overstreet 8296 1.0% Christoph Hellwig 7202 0.8% Eric Biggers 7012 0.8% Dr. David Alan Gilbert 6844 0.8% Nuno Das Neves 6419 0.8% Ivaylo Ivanov 5938 0.7% David Howells 5909 0.7% Alex Deucher 5398 0.6% Matthew Brost 5312 0.6%
Once again, the developer with the most changesets was Kent Overstreet, who continues to work on stabilizing the bcachefs filesystem. Kuninori Morimoto contributed a large set of cleanups to the sound subsystem. Ville Syrjälä worked exclusively on the Intel i915 graphics driver. Andy Shevchenko contributed small improvements throughout the driver subsystem, and Alex Deucher worked, as always, on the AMD graphics driver subsystem.
Returning to a pattern often seen in recent years, the "lines changed" column is led by Wayne Lin, who contributed yet another set of AMD GPU header files. Ian Rogers made a number of contributions to the perf subsystem, including updating the large Intel vendor-events files. Miri Korenblit added the new "iwlmld" driver for newer Intel WiFi adapters. Bitterblue Smith added a number of RealTek WiFi driver variants, and Andrew Donnellan removed a couple of unused CXL drivers.
The top testers and reviewers this time around were:
Test and review credits in 6.15
Tested-by Daniel Wheeler 163 9.2% Neil Armstrong 64 3.6% Thomas Falcon 35 2.0% Babu Moger 30 1.7% Shaopeng Tan 30 1.7% Peter Newman 30 1.7% Amit Singh Tomar 30 1.7% Shanker Donthineni 30 1.7% Stefan Schmidt 28 1.6% Nicolin Chen 25 1.4% Xiaochun Lee 25 1.4% Venkat Rao Bagalkote 24 1.4% Andreas Hindborg 21 1.2% Alison Schofield 21 1.2% Carl Worth 21 1.2%
Reviewed-by Simon Horman 271 2.7% Krzysztof Kozlowski 161 1.6% Dmitry Baryshkov 147 1.5% Geert Uytterhoeven 112 1.1% Andrew Lunn 109 1.1% Ilpo Järvinen 105 1.1% Darrick J. Wong 105 1.1% David Sterba 102 1.0% Rob Herring (Arm) 100 1.0% Jonathan Cameron 97 1.0% Linus Walleij 96 1.0% Charles Keepax 93 0.9% Jan Kara 88 0.9% Christoph Hellwig 82 0.8% Jacob Keller 81 0.8%
Daniel Wheeler retains his permanent spot as the top-credited tester; nobody else even comes close. The top reviewers are a bit different this time around, with Simon Horman reviewing just over four networking patches for every day of this development cycle.
There were Tested-by tags in 1,411 6.15 commits (9.7% of the total), while 7,332 (50.2%) of the commits had Reviewed-by tags.
Work on 6.15 was supported by (at least) 195 employers, a slightly smaller number than usual. The most active employers were:
Most active 6.15 employers
By changesets Intel 1755 12.0% (Unknown) 1302 8.9% 983 6.7% (None) 930 6.4% Red Hat 889 6.1% AMD 881 6.0% Linaro 645 4.4% SUSE 549 3.8% Meta 493 3.4% NVIDIA 370 2.5% Huawei Technologies 370 2.5% Renesas Electronics 367 2.5% Qualcomm 319 2.2% Arm 301 2.1% Linutronix 296 2.0% Oracle 286 2.0% IBM 282 1.9% Microsoft 259 1.8% (Consultant) 180 1.2% NXP Semiconductors 179 1.2%
By lines changed AMD 125923 14.9% (Unknown) 97908 11.5% Intel 94150 11.1% 67461 8.0% IBM 48682 5.7% (None) 45049 5.3% Red Hat 43981 5.2% Qualcomm 34014 4.0% Meta 26182 3.1% Microsoft 19431 2.3% Linaro 16389 1.9% NVIDIA 16191 1.9% SUSE 15175 1.8% Huawei Technologies 14136 1.7% Xilinx 12961 1.5% Collabora 11640 1.4% Arm 9357 1.1% NXP Semiconductors 8857 1.0% Rockchip 8085 1.0% BayLibre 8037 0.9%
This is mostly the usual list of companies that consistently support kernel work from one year to the next. Linutronix has moved up the list this time around, mostly as the result of a lot of work on the kernel's timer subsystem. IBM, once one of the top contributors to the kernel, continues to move downward.
A different view of how the process works can be had by looking at the Signed-off-by tags applied to patches, specifically those applied by developers other than the author. Those additional signoffs are the traces left when developers forward a patch or apply it to a Git repository on its way toward the mainline; they thus give a clue as to who is doing the work of herding patches upstream. For 6.15, the signoff statistics look like this:
Non-author Signed-off-by tags in 6.15
Developers Jakub Kicinski 955 7.0% Mark Brown 774 5.7% Andrew Morton 649 4.8% Alex Deucher 571 4.2% Ingo Molnar 400 2.9% Greg Kroah-Hartman 389 2.9% Jens Axboe 325 2.4% Paolo Abeni 314 2.3% Hans Verkuil 257 1.9% Thomas Gleixner 235 1.7% Christian Brauner 218 1.6% Namhyung Kim 194 1.4% Jonathan Cameron 186 1.4% Alexei Starovoitov 183 1.3% Johannes Berg 160 1.2% Heiko Stuebner 148 1.1% Martin K. Petersen 137 1.0% Vinod Koul 137 1.0% David Sterba 131 1.0% Shawn Guo 130 1.0%
Employers Meta 1702 12.5% 1405 10.3% Intel 1310 9.6% Red Hat 1151 8.5% Arm 955 7.0% AMD 908 6.7% Linaro 768 5.7% Microsoft 427 3.1% Linux Foundation 418 3.1% SUSE 404 3.0% (Unknown) 376 2.8% (None) 331 2.4% Qualcomm 307 2.3% NVIDIA 304 2.2% Huawei Technologies 289 2.1% Linutronix 283 2.1% Cisco 281 2.1% Oracle 202 1.5% LG Electronics 194 1.4% IBM 173 1.3%
One patch out of every eight going into the kernel now passes through the hands of a maintainer at Meta, and nearly as many are handled by Google developers.
As of this writing, there are well over 12,000 commits in linux-next, almost all of which can be expected to find their way into the kernel during the 6.16 merge window. That suggests that the next development cycle will be as busy as this one was. As always, keep an eye on LWN to keep up with the next kernel as it is assembled and stabilized.
Long-duration stress-testing for filesystems
Testing filesystems is a frequent topic at the Linux Storage, Filesystem, Memory Management, and BPF Summit (LSFMM+BPF); the 2025 edition was no exception. Boris Burkov led a filesystem-track session to discuss stress-testing filesystems—and running those tests for lengthy periods. He reviewed what he has been doing when testing filesystems and wanted to gather ideas for what could be done to catch more bugs before the filesystems hit production.
He began by noting that he works for Meta on Btrfs, which means that he
spends a lot of time tracking down "weird bugs that you only see on
millions of computers
". Production use stresses filesystems, so it
makes sense for filesystem developers to do that stressing ahead of time to
try to catch bugs before they reach production. To get an idea of what
kinds of bugs made it into production, he surveyed ones that Meta had
encountered; "it was a quick biased sample
" that may miss some types
of bugs or overemphasize others. There were two data-corruption bugs, a
metadata corruption that "took a few months of Josef [Bacik] and I
debugging it to find
", a noisy-neighbor problem where misbehaving
containers could cause a problem in other containers due to CPU and
global-lock contention, and corruption when trying to enable large folios on
XFS.
![Boris Burkov [Boris Burkov]](png/lsfmb-burkov-sm.png)
Burkov tried to extract some patterns from those problems and their investigations. One important thing emerged: all of these kernels were tested with fstests using the "auto" group (i.e. -g auto). That group is a set of tests meant for regression testing. It is something that Meta runs daily and on every commit; he thought that many other developers and companies were probably doing something similar.
The way these problems were reproduced for debugging was with custom
scripts that were somewhat similar to the buggy workload; they would often
require hours or days to reproduce the problem. Those runs were done with
"a high degree of parallelism and perhaps with other stressing
conditions
" until the problem would occur. Frequently, getting the bug
to reproduce relied on memory pressure or increased concurrency.
Data integrity seems to be something of blind spot, he said. Roughly half
of the bugs boiled down to some kind of data corruption, and "far from
half of fstests are about data corruption
".
The obvious next step is to look at what others have done for
stress-testing, he said. There is a "soak" group for fstests, which is
aimed at longer-duration tests, as well as
fsstress
and fsx
from the Linux Test
Project (LTP) that are available. In the "default" settings, though he
noted that his defaults may not be universal, those tests generally do not run for
more than about ten minutes in practice, however. The SOAK_DURATION
parameter can be used to run that group for as long as desired, which
Darrick Wong described in his response
to Burkov's session
proposal post. Most of the stress tests in fstests run fsstress or fsx
in combination with some other "nasty thing
" like CPU hotplug, Btrfs
balance operations, or XFS scrub operations.
The operations that fsstress uses are extensive, and include some filesystem-specific operations, all of which is great, Burkov said. But what is lacking are some stressors, the biggest of which is memory pressure. Another is more parallelism for the operations, which is something that Dave Chinner mentioned in conjunction with the check-parallel test script in his reply to the topic proposal. Chris Mason had suggested adding in random filesystem-sync and cache-clearing operations into the mix as well, Burkov said.
As part of his research into filesystem stress-testing, he came across a
paper
about the NFSv4 test project, which "had a passage that struck
me
":
One year after we started using FSSTRESS (in April 2005) Linux NFSv4 was able to sustain the concurrent load of 10 processes during 24 hours, without any problem. Three months later, NFSv4 reached 72 hours of stress under FSSTRESS, without any bugs. From this date, NFSv4 filesystem tree manipulation is considered to be stable.
That is how the NFSv4 developers decided the filesystem was stable.
The 72 hours of fsstress is not really part of his testing, though he
thinks Btrfs would pass that bar. He does not really think of Btrfs as
"stable", however. It is 20 years since that statement was made for NFSv4,
so Burkov wondered what the modern equivalent for today's filesystems
should be. His proposal was:
"Run dozens of relevant complex operations including fsstress and fsx in
parallel under memory pressure for 72 hours
", where the dozens
include various, sometimes filesystem-specific, operations such as sync,
reflink, balance, dropping caches, memory compaction, and CPU hotplug.
Another option might be to modify fsstress itself, perhaps by using its
-x option to run commands to, say, drop the caches, or by running it
in a control group to add memory pressure. In addition, data-integrity checks and
more filesystem-specific stressors could be added. The
check-parallel script does not really fulfill the goals that he
sees as needed, but "there's definitely room for it to grow into that
space
". He was open to suggestions if none of those really appealed;
"what do people think we should do as a gold standard for stress
testing?
"
Ted Ts'o thought that check-parallel may not be great for finding
problems, as Chinner had suggested, though Ts'o said that is good for
triggering these kinds of problems. There is, however, a need to
find "ways of running these soak tests which are reproducible
enough
" to "reliably trigger the failure multiple times
" in the
shortest time possible. Tests that take 72 hours and fail 50% of the time,
for example, will be difficult to use to track down bugs and to verify that
they have been fixed, so quick reproducibility is important. He is
concerned that relying on check-parallel will make that more
difficult because it is so timing-dependent.
In his testing, Ts'o has found that using a variety of storage devices is
important; "some things only trigger if you're using a fast SSD, other
things only trigger if you are using spinning-rust platters
". If a
problem happens once on a hard disk, for example, try to reproduce it on a
fast SSD or ramdisk, he said. "It's not enough just to say 'great we
were able to trigger a failure', it's 'can we trigger a failure
reliably?'
" Burkov said that he could not agree more, as fast
reproducibility is what he is constantly working toward with his testing
and tools.
Chuck Lever said that he had a suggestion for a test to use, but feared it would fall into the "intermittently reproducible" bucket: the Git regression test suite. It runs nearly 1000 different tests and checks the contents of the files that are manipulated using Git operations. He turns that functionality test into a stress test by running it in multiple threads using a command like "make -j 16". Often the single-threaded test will run reliably many times, but the multi-threaded test will fail and generally pretty quickly. But it is sometimes hard to track down what went wrong, he said, because it is testing Git, not filesystems.
Zach Brown said that the parameter space for filesystems was so large that it was not really productive to try to claim that it has been exhaustively explored. But, as Burkov has seen, there are parts of that parameter space that are frequently seen in production, but have not been tested much; a good example is memory pressure, Brown said, which is a problem area that he has also observed. He wondered if it made more sense to try to somehow fingerprint production deployments to determine where in the parameter space they are running, which could point to areas that are not being tested.
Bacik and Ts'o both thought that adding more data verification into fsx and fsstress would be useful; the belief is that neither does much or any of that right now. Mason said that another stressor that should be added into the mix is memory compaction; there are various ways that the memory-management subsystem moves pages around underneath the filesystems, which may help shake out bugs. Luis Chamberlain suggested running tests in virtual machines with a different types of filesystem in the guest and host.
Ts'o said that it might make sense to collaborate on the "antagonists
"
(stressors) so that they can be run in all of the different test harnesses
that are in use. Once that is done, "a bunch of standard antagonist
packages
" could be added to fstests; if the antagonists are defined at
that level, more filesystems and testers will be able to use them. As time
ran out, Chamberlain noted that the Rust developers require adding tests
for new APIs in order for them to be merged, but that is not something that is
required for filesystem APIs, which should change, he said.
Formally verifying the BPF verifier
The BPF verifier is an increasingly complex and security-critical piece of code. When the kinds of people who are apt to work on BPF see a situation like that, they naturally question whether it's possible to use formal verification to ensure that the implementation of the code in question is correct. Santosh Nagarakatte led the first of two extra-long sessions in the BPF track of the 2025 Linux Storage, Filesystem, Memory Management, and BPF Summit about his team's work formally verifying the BPF verifier with a custom tool called Agni.
Agni's history
![Santosh Nagarakatte [Santosh Nagarakatte]](png/santosh-nagarakatte-lsfsmmbpf-small.png)
Work on Agni began about 6 years ago, Nagarakatte said, when he got interested in the PREVAIL BPF verifier, and met other people excited to study it. Since then, Harishankar Vishwanathan, Matan Shachnai, Srinivas Narayana, and Nagarakatte have been working at the Rutgers Architecture and Programming Languages Research Group to develop the tool.
The Linux kernel's BPF verifier is probably the first real instance of formal
verification "in production
", Nagarakatte said.
Other projects that use formal verification
tend to do so "on the side
", not as part of the running, deployed system.
That makes it interesting because writing correct formal verifiers is hard, and
the BPF verifier will often be running in a context where it's hard for the
original developer to spot errors.
So, he asked, can we understand the algorithms that the BPF verifier uses, and guarantee that they're correct? The BPF verifier has a lot of different components, so Nagarakatte and his team decided to start by tackling value tracking: the part of the verifier that determines what values a variable can have at different points in the program. Narayana's later session, which will be the subject of a separate article, covered their subsequent work on checking whether the verifier's path-pruning algorithm is correct.
Their first stab at the problem was to manually encode and check some proofs about the BPF verifier's abstract-value-tracking implementation. That worked fine for addition, but they couldn't make it work for the verifier's checking of multiplications. As a result of that experience, they ended up writing a new algorithm for multiplying abstract values that was amenable to verification, and got that accepted into the mainline kernel. So, from that work, they were confident that addition and multiplication were correct, which is already useful.
The BPF verifier changes all of the time, however, and manually keeping their proofs up to date was clearly not going to be feasible. That's where Agni steps in: it takes the C source code of the BPF verifier and converts it into a satisfiability modulo theory (SMT) problem that can be automatically proved or disproved by an SMT-LIB implementation such as Z3. If the solver can prove that the verifier is correct, that's excellent.
If it finds a counterexample, however, the raw output is not particularly useful. Ideally, Nagarakatte's team wants the BPF developers to be able to use Agni as an extra check during development — something that can be used to test changes before they actually make it into the kernel. In pursuit of that goal, they added a program-synthesis component. If the SMT solver finds that the verifier is not correct, Agni will take the output of the SMT solver and use it construct a proof-of-concept BPF program that triggers the bug in the verifier. That can be fed back to the developer to illustrate where the failure comes from.
Verifying arithmetic
With that high-level history of the project out of the way, Nagarakatte went on
to explain how Agni actually does this. First, it takes the C source
code and compiles it to LLVM's intermediate representation (IR).
Agni doesn't need to handle every corner-case of the IR because it turns out
that the verifier's code is not
"as bad as other real world C
" — it uses a fairly limited subset of the
language.
Once Agni has the IR,
it uses LLVM's dead-code elimination to focus on a single operator at a time by
discarding all of the parts of the verifier that aren't relevant to that operator.
Those operators are used to combine the verifier's abstract representations of what a variable could be. So it's not as simple as adding two concrete numbers — instead, the verifier has to be able to answer questions like "if register 1 has a number between 0 and 100, and register 2 has a number between 3 and 5, is their sum less than the length of this array?". This information is used throughout the verifier to ensure that accesses are within-bounds and aligned.
In particular, the verifier tracks which bits of a value are known exactly, as well as what its range of possible values is as a signed or unsigned number. Shung-Hsi Yu led a session at the 2024 summit about his work simplifying the representation of these abstract values.
For each mathematical and bitwise operator, Agni takes the LLVM IR and translates it into a machine-checkable specification that the operator is implemented correctly. This transformation ends up using type information from the LLVM IR, which poses a problem because some of that type information is not available in LLVM version 15 or higher. Eventually, when the kernel updates to require LLVM 15, Agni will break and the BPF developers will need to find an alternate approach. That was a problem Nagarakatte wanted to discuss with the assembled developers in more depth.
What it means for an abstract operator of this type to be correct ("sound") is remarkably straightforward, as complicated mathematical definitions go. Suppose that there are two abstract values (considered as sets of possible values, even though this isn't how the verifier represents them in memory), P and Q, and two specific numbers, x and y, which are members of P and Q respectively. The verifier's implementation of "+" is sound if the abstract representation that comes out of calculating the operation of "+" on two registers containing P and Q always contains the number "x + y". That is to say, given some specific numbers that are correctly modeled by two abstract register states, adding the two numbers should produce something that is correctly modeled by the addition of the two abstract states.
Complications
At first, they had planned to verify each way that the verifier tracks values (as known bits, and signed and unsigned ranges) independently. That turns out not to work, however, because the verifier actually shares information between these representations. For example, if it knows that all of the bits other than the least significant two are zero, it also knows that the signed and unsigned ranges are 0-3. In the absence of this sharing of information, the BPF verifier's implementation would be unsound. The academic term for this sort of thing is a "shared refinement operator"; a refinement operator being something that slims-down an abstract value by ruling out impossible values.
Once they were able to successfully model the shared refinement operator, they finally got confirmation that modern kernels are sound. Specifically, they were able to show that kernels from version 5.13 onward were sound. The oldest kernel version they tested was 4.14, so that left the problem of how to demonstrate an actual problem in the kernels between those versions — or, if they could not, to discover another deficiency in Agni.
This is where the idea of synthesizing BPF programs came in. If Agni can prove that the verifier's implementation of an operator is not correct, that essentially means that it has figured out a way to add two registers that outputs a concrete value the verifier is not expecting. Then the problem becomes: how to create a BPF program that puts the verifier into those specific abstract states, and ends up calculating the bad final value.
They saw that the real-world failures from earlier kernel versions were generally caused by fairly simple conditions, and so ultimately selected a brute-force approach. Agni will consider every BPF program that uses a series of arithmetic instructions ending in the flawed one in increasing order of program length, and return the smallest that triggers the bug.
This approach worked to generate several proof-of-concept BPF programs for older kernels. Unfortunately, SMT-solving is NP-complete, and, as the verifier has become more complicated, the time it takes Agni to verify that its implementation is correct has grown. Agni ran against kernel version 4.14 for 2.5 hours, against version 5.13 for ten, and against version 6.4 for several weeks. Then, Andrii Nakryiko posted a patch that improves the accuracy of the verifier's shared refinement operator, which significantly slows Agni's analysis, leading to timeouts.
Going faster
At this point, the team working on Agni was in a rough place: they had a working tool that could turn up real bugs in the BPF verifier, but it wasn't going to be able to keep up with new kernels because of scaling problems. So they decided to try to break the problem down into subproblems that could be solved independently.
Each abstract operator that Agni extracted from the verifier came to about 5,000 lines of SMT-LIB code. Of those, about 700 lines are the actual operator itself, and the rest is the code for the shared refinement operator. They decided to see if they could verify the shared refinement operator once, and share that proof between all of the operators.
That approach didn't work, because it turns out that the shared refinement operator was also masking latent unsoundness in some of the bitwise operations. These didn't represent real bugs, because in the actual verifier the shared refinement operator was always used. But they did represent a barrier to Agni, because it seemingly made it impossible to verify the shared refinement operator independently of the operations that used it.
The solution ended up being to submit a small fix for the bitwise operators. Once those patches were accepted, the divide-and-conquer approach became feasible, and Agni's run time dropped to less than 30 minutes for kernel version 6.8.
Future work
John Fastabend asked whether modeling the shared refinement operator separately allowed them to say whether the fixed versions of the bitwise operators were more or less precise (in the sense of more closely approximating the minimal set of possible values of the output). Nagarakatte said that is was exactly as precise, actually. Daniel Borkmann asked whether they had looked into whether the shared refinement operator could be made more precise. Nagarakatte said that they were experimenting with that internally, and once they have a better refinement operator that they're confident won't break anything, they'll submit a patch set.
Fastabend asked whether they would be able to use the tool to find redundancy in
the C code — that is, conditions that the verifier checks even though a check is not
needed. Nagarakatte responded that one of his students was working on a project
to synthesize abstract operators from scratch, which "should be as
good or better than what the kernel does
". They've already come up
with a more concise representation for abstract values, although the data
structure the kernel uses has already been proved to be maximally precise.
Recently, Nagarakatte's student shared a patch that improves the precision of the multiply instruction to work better with negative values. He wants to work with them to put together a paper on the technique once they can explain it, at which point it may be applicable to other parts of the verifier.
With Agni fully described, he then wanted to turn to the topic of how to move forward. The main upcoming problem Nagarakatte foresees is the kernel moving to LLVM 15. His preferred resolution would be for the BPF developers to rewrite the verifier in some abstract specification language, which could be used as an input to Agni and as a source of generated C code. He was optimistic that writing the verifier in a higher-level language would make improving the verifier and reviewing it easier for everyone.
Borkmann mentioned that Nagarakatte had proposed the idea of embedding some kind of domain-specific language (DSL) for the verifier in the comments of the C source code; he asked whether that invites the problem of ensuring that the DSL actually corresponds to the C code. Nagarakatte agreed that was a problem, but it's a much easier problem than parsing C source code correctly without LLVM.
Another
audience member pointed out that any DSL for verifier code would be yet another
language to learn — "how do we make that easy?
"
Nagarakatte explained that when he said it would be nice to use a DSL, he didn't
mean anything too complicated. One of the problems that they're dealing with in
Agni is handling arguments that are passed in pointers; right now, they're
relying on LLVM's analysis to remove memory accesses from the code to make
modeling it easier. If the developers could specify argument types with a DSL,
it could potentially simplify things.
One person asked whether this kind of approach could be extended to other parts of the kernel. Nagarakatte said that there are other static-analysis-based approaches that could be applied to other parts of the kernel. The seL4 microkernel, for example, has a formal proof of correctness. He hasn't been working on that, though; he has been focusing on Agni. Ultimately, as with so many things in open source, it just needs someone to take the time to make it happen.
Amery Hung wanted to know whether there were other parts of the verifier
that could be formally verified, beyond arithmetic operations. Nagarakatte said that he was
excited about looking at Spectre mitigations, which he thinks may be provably
unnecessary in some places. The group is also planning to look at improving
precision, and verifying the correctness of the verifier's path-pruning
algorithm, which was the subject of Narayana's talk.
The path-pruning logic is "leaving a lot on the table
", he
said, because the logic is widely dispersed throughout the code, which makes it
hard to simplify.
There were a few more minutes of clarification about the exact claims that Agni
proves, and why newer LLVM versions were problematic, but eventually the session
came to a close.
Verifying the BPF verifier's path-exploration logic
Srinivas Narayana led a remote session about extending Agni to prove the correctness of the BPF verifier's handling of different execution paths as part of the Linux Storage, Filesystem, Memory Management, and BPF Summit. The problem of ensuring the correctness of path exploration is much more difficult than the problem of ensuring the correctness of arithmetic operations (which was the subject of the previous session), however. Narayana's plan to tackle the problem makes use of a mixture of specialized techniques — and may need some assistance from the BPF developers to make it feasible at all.
Path exploration is a key component of the BPF verifier, Narayana said. It's what makes it practical for the verifier to infer precise bounds for registers even in the presence of conditionals and loops. The brute-force approach to path exploration would be to consider every possible path through the program. That means considering a number of paths exponential in the number of conditionals, which would be slow.
Instead, the verifier is somewhat selective about exploring paths: it attempts to explore only paths that are essentially different from other paths that have already been considered. In other words, if the register values along a path are a subset of what has already been checked, the verifier can avoid exploring that path because it knows the BPF program has already been verified under more general preconditions.
This optimization substantially speeds up the verification of programs with complex control flow; it's also quite complicated to implement correctly, and has already resulted in at least one security problem. Narayana wants to use Agni to show that the current path-pruning logic is implemented correctly.
Unlike with arithmetic operators, however, specifying what a correct implementation of path pruning looks like is difficult. The core requirement is that pruned paths must only exhibit a subset of the previously explored safe behaviors of the program, but the path-pruning logic depends on several other parts of the verifier to make that determination. For example, the verifier tracks whether each register is used in a subsequent computation (whether it is "alive") in order to decide whether a register can be relevant to a path. So the correctness of path pruning depends on the soundness of this tracking.
There is a lot of existing academic research on how to make sure tracking the future use of a register is correct; the problem is how to apply that research to the verifier. Narayana's proposal is to use the existing research to produce a set of exhaustive tests covering every possible scenario. Testing is not normally thought of as a formal-verification technique, but exhaustive testing is essentially a direct proof of correctness. The difficulty is in showing that the set of tests is actually exhaustive. A similar approach can be taken for other parts of the verifier that deal with tracking the use of registers.
Narayana listed eight total conditions that must be fulfilled for path pruning to be correct. Four of these are basic assumptions about how the verifier is called and the safety properties of BPF programs that must be manually audited by a human. One is already covered by Agni: the correctness of arithmetic operations. Another is the requirement that dataflow algorithms (such as tracking whether registers are alive) are correct, which he intends to ensure through testing. The final two are specific to path pruning: "state containment" and "sound generalization".
State containment
State containment is the simpler property to explain, but it still benefits from the use of an example. In Narayana's slides, he used an image of a control-flow graph to illustrate his point, but for readers without a background in compiler design, this program may be clearer:
... int r2; int r4; if (r1 == 10) { r4 = 15; r2 = r4; } else { r2 = rand(0, 20); } int r3 = r1 + r2; ...
Suppose the verifier has been verifying a version of this program that has been compiled to BPF, with the integer variables being stored in the BPF registers with the same names. The verifier will reach the assignment to r3 by two different paths: one where r2 is 15, and one where r2 is some number between 0 and 20. The question of state containment is: is the abstract state of the program in the first case a subset of the abstract state of the program in the second case? It's easy to see that if r2 can be anything between 0 and 20, it can also be 15. In fact, Agni already has a correctness proof for the function that calculates these kinds of comparisons in the verifier as part of its existing scope.
What about r4? In the first state, it is also 15. In the second state, it hasn't been assigned to, and therefore reading from it would be forbidden. Logically, if the code were to read from r4 at some point in the future, then the whole program would be rejected. Therefore, it's valid for the verifier to consider the first state as "contained" in the second state: if the second state eventually leads to the program being verified correctly, then the first state would have done the same, so the first state doesn't actually need to be explored further.
In the general case, when the two states being compared have come to the same point in the program via arbitrarily complex paths, the question of state containment breaks down into the same two parts: whether the possible values of registers in one state are a subset of possible values in the other state, and whether the legal dataflow from registers in one state is more restrictive than the legal dataflow from registers in the other state. Narayana wants to research how to formalize the rules for answering the second part correctly. Trying to write down the rules formally will help us prove it, he said.
Sound generalization
The final piece of the puzzle for proving the path-exploration logic correct is sound generalization. Consider this slightly modified example:
... int r2; if (r1 == 10) { r2 = 47; } else { r2 = rand(0, 20); } int r3 = 5 + r2; ...
In this case, the path where r1 is 10 results in a state that is clearly not a subset of the path where r1 is something else. These two states have different possible values for r3. The verifier, however, will sometimes unify these states anyway. Suppose that from this point in the program onward, it doesn't matter whether r3 is from 5 to 25 or 52. Suppose that the program would be correct as long as r3 is less than the length of an Ethernet frame, which is at least 64 bytes. If that were the case, then the verifier could combine the states even though one is not contained in the other.
In general, this kind of pruning (called generalization) is correct as long as the combined state that the verifier creates (such as "r3 is between 5 and 52") is stronger than the weakest precondition required to ensure that the program from this point onward is still safe. This is easy to check for a single program, Narayana said, but figuring out how to prove it for all programs is somewhat tricky.
His current example is to take an existing algorithm for finding weakest preconditions that has a proof of correctness, and generate a set of exhaustive tests showing that in every case, the preconditions computed by the verifier are at least as strict as the preconditions computed by the formally verified algorithm. In this way, the proof of correctness for a well-known, high-level algorithm can essentially be automatically extended to cover the verifier's implementation.
The idea of testing the verifier in that way raises an obvious question, however: why not simply use an existing algorithm for finding weakest preconditions directly? Narayana looked at the path-pruning code in the 6.12 kernel, and found that it was not generalizing states in all possible cases, resulting in wasted work spent verifying paths that don't need it. If the verifier were changed to compute the weakest precondition in a systematic way, it would be both more efficient and easier to prove correct (by proving that the C implementation of the weakest-precondition-finding code matches the known-correct high-level algorithm).
Going forward
Path exploration is critical to both the correctness of the verifier, and to its performance, Narayana said. It's a challenging problem, with a lot of opportunity for error. Extending Agni to show that the verifier's path exploration is correct is going to require substantial work. While he and his colleagues intend to keep working on it, there are a few things that the BPF developers can do to help. For one, he would like to be involved in the discussion of any new features that might impact the path-pruning logic.
He reiterated Santosh Nagarakatte's call for the BPF verifier to start including structured comments in a domain-specific language (DSL), to make writing proofs about it easier. In response to a question from the audience, he clarified that he does not have a specific DSL in mind, but introducing any higher-level abstraction over C will make it easier to prove that the verifier implements a particular algorithm that corresponds with existing research.
The assembled BPF developers were generally supportive of his work, although they recognized it as an ambitious project. Agni has already helped eliminate bugs in the simplest parts of the verifier; hopefully, Narayana and his colleagues will be able to bring similar guarantees to the parts of the BPF verifier most in need of them.
Allowing BPF programs more access to the network
Mahé Tardy led two sessions about some of the challenges that he, Kornilios Kourtis, and John Fastabend have run into in their work on Tetragon (Apache-licensed BPF-based security monitoring software) at the Linux Storage, Filesystem, Memory Management, and BPF Summit. The session prompted discussion about the feasibility of letting BPF programs send data over the network, as well as potential new kfuncs to let BPF firewalls send TCP reset packets. Tardy presented several possible ways that these could be accomplished.
Sending data
Tetragon has two general jobs: enforcing security policies and collecting statistics and other information for observability. The way that the latter currently works, Tardy explained, introduces unnecessary copies. BPF programs will create records of events and place them into a ring buffer. Then Tetragon's user-space component reads the events and eventually writes them to a file, a pipe, or a network socket in order to centralize and store them.
![Mahé Tardy [Mahé Tardy]](png/mahe-tardy-lsfsmmbpf-small.png)
That requires a minimum of two copies between the kernel and user space. While exploring alternatives, Tardy realized that this situation could be avoided if BPF programs were allowed to call vmsplice(). The user-space agent could give the BPF program a file descriptor, and let BPF call vmsplice() to forward the information. Eventually, it might be possible to remove the user-space agent altogether.
An alternative to vmsplice() would be to use io_uring to perform the same operations. Tardy clarified that for his use case, he really mostly cares about being able to send data over the network. Generally, Tetragon sends two types of data: alerts and periodic reports. The periodic reports are created in a timer callback, which may cause additional complications since he isn't sure whether those are called in a sleepable context.
Andrii Nakryiko thought that a synchronous send operation — which could block for a long time — would be a bad fit for BPF. Tardy agreed, saying that an asynchronous send operation would be fine. Nakryiko thought this was a lot of effort to avoid a small number of copies. Alexei Starovoitov pointed out that there is such a thing as a kernel TCP socket, so this is technically possible. Also, workqueues call their tasks in a sleepable context, so the operation could be run as a workqueue item and that would work. He agreed that it seemed like a lot of effort to avoid user-space copies, though.
Tardy explained that forwarding these reports
is "almost the last thing the agent is doing
". If it could be done in
BPF, Tetragon would be close to being implemented in pure BPF. Although he
didn't speak to why this would be desirable, an earlier session had raised the
idea of making security software harder to tamper with by avoiding user-space
components, so that may have been what he had in mind.
Starovoitov pointed out that there is an ongoing effort to use netconsole to send kernel log messages over TCP. So perhaps Tetragon's BPF programs could be made to print to the console, which is then sent over TCP. Daniel Borkmann asked whether netconsole could send arbitrary data; Starovoitov said that it could. Tardy suggested that they could start by prototyping something using netconsole's existing UDP-based messages. The session ended without coming to a firm conclusion, but Tardy left with a number of new directions to explore.
TCP reset
Currently, it is possible for BPF firewalls to drop packets, and therefore de-facto terminate a TCP connection. It would be friendlier, Tardy said in his second session, to send a TCP reset to immediately terminate the connection. This is already what other firewalls, like netfilter, do; Tardy wants to add a kfunc to let BPF programs do the same thing.
One possible way to add that would be to extend the bpf_sock_destroy() function that Aditi Ghag added in 2023. That function lets BPF programs close sockets in specific circumstances: while inside an iterator and holding the socket lock. The fact that it sends a TCP reset is really a side effect of its main operation, but it is somewhat related.
Borkmann pointed out that using bpf_sock_destroy() would only work if the socket existed on the machine in question; a firewall sitting between a client and a server would need a different way to send a reset. Another member of the audience suggested setting up an unroutable route, forwarding a packet from the TCP connection to that, and letting the existing networking stack handle the rest.
There is already a kernel function that allows BPF programs to send TCP acknowledgment messages; in light of that, adding one for sending reset messages struck some people as not a big deal. Ultimately, this discussion didn't reach a conclusion either, but there was no real opposition to the idea of allowing BPF programs to cleanly terminate TCP connections.
Reports from OSPM 2025, day two
The seventh edition of the Power Management and Scheduling in the Linux Kernel Summit (known as "OSPM") took place on March 18-20, 2025. Topics discussed on the second day include improvements to device suspend and resume, the status and future of sched_ext, the scx_lavd scheduler, improving the efficiency of load balancing, and hierarchical constant bandwidth server scheduling.As with the coverage from the first day, each report has been written by the named speaker.
Device suspend/resume improvements
Speaker: Rafael J. Wysocki (video)
Possible improvements to device suspend and resume during system-wide power-management (PM) transitions were discussed. To start with, Wysocki said that this topic was not particularly aligned with the general profile of the conference, which focused on scheduling and related problem spaces, but he thought that spending some time on it might be useful anyway. It would be relatively high-level, though, so that non-experts could follow it.
He provided an introductory part describing the design of the Linux kernel's code that handles transitions to system sleep states and back to the working state, and the concepts behind it.
A system is in the working state, he said, when user-space processes can run. There are also system states, referred to as system sleep states, in which user space is frozen and doesn't do any work; these include system suspend and hibernation. The system enters sleep states to save energy, but when user work needs to be done, it goes back to the working state. Those transitions, referred to as system suspend and resume, respectively, affect the system as a whole and, if the kernel is configured to support system sleep states, every system component needs to play its part in handling them. In other words, support for system suspend and resume (and hibernation, if the kernel is configured to support it) is mandatory.
As a rule, transitions from the working state into one of the sleep states are initiated by user space, but transitions from a sleep state back into the working state are started in response to a signal from a device; this signal is referred to as a system wakeup event. Devices allowed to trigger system wakeup events are referred to as wakeup devices.
When a transition into a system sleep state is started, all devices need to be suspended. All activity must be stopped, hardware needs to go into low-power states, and wakeup devices need to be configured to trigger wakeup events. During a transition back into the working state, the reverse needs to happen, except that, in some cases, it is possible (or even desirable) to leave a device in suspend after a system resume and let it be handled by run-time power management. All of that should be as fast as reasonably possible because some systems, like phones, suspend and resume often.
In the working state, individual components of the system are subject to power management (PM) through frameworks like run-time PM, device-frequency scaling (devfreq), CPU-frequency scaling (cpufreq), CPU idling (cpuidle), energy-aware scheduling (EAS), power capping, and thermal control. Obviously, this needs to be taken into account when the system goes into a sleep state. Some devices may need to be reconfigured, which may require accessing their registers, and they may need to be resumed to satisfy dependencies. On the way back to the working state, care must be taken to maintain consistency with working-state PM.
Dependencies between devices must be taken into account during transitions between the working state and sleep states. Obviously, children depend on their parents, but there are also dependencies between suppliers and consumers, represented in the kernel by device links. Dependent devices cannot be suspended after the devices they depend on and they cannot be resumed before those devices.
Three layers of code are involved in transitions between the working state and sleep states of the system. The PM core is responsible for the high-level flow control, the middle-layer code (bus types, classes, device types, PM domains) takes care of commonalities (to avoid duplication of code, among other things), and device drivers do device-specific handling. As a rule, the PM core invokes the middle-layer code that, in turn, invokes device drivers, but in the absence of the middle-layer code, the PM core can invoke device drivers directly.
There are four phases to both the suspend and resume processes. In the "prepare" phase of suspend, new children are prevented from being added under a given device and some general preparations take place, but hardware settings should not be adjusted at that point. As a general rule, device activity is expected to be stopped in the "suspend" phase; the "late suspend" and "suspend noirq" phases are expected to put hardware into low-power states.
Analogously, the "resume noirq" and "early resume" phases are generally expected to power-up hardware. If necessary, the "resume" phase is expected to restart device activity, and the "complete" phase reverses the actions carried out during the "prepare" phase. However, what exactly happens to a given device during all of those phases depends on the specific combination of the middle-layer code and the device driver handling it.
The "noirq" phases are so-called because interrupt handlers supplied by device drivers are not invoked during these phases. Interrupts are handled during that time in a special way such that interrupts involved in triggering wakeup events will cause the system to go back to the working state (resume). Run-time PM of devices is disabled during the "late suspend" phase and it is re-enabled during the "early resume" phase, so those phases can be referred to as "norpm" (no-run-time-PM) phases.
The handling of devices during transitions between the working state and sleep states of the system is coordinated with device run-time PM to some extent. The PM core freezes the run-time PM workqueue before the "prepare" phase and unfreezes it after the "complete" phase. It also increments the run-time PM usage counter of every device in the "prepare" phase and decrements that counter in the "complete" phase, so devices cannot run-time suspend during system-wide transitions, although they can run-time resume during the "prepare", "suspend", "resume", and "complete" phases.
Moreover, the PM core takes care of disabling and re-enabling run-time PM for every device during the "late suspend" and "early resume" phases, respectively. In turn, the middle-layer code and device drivers are expected to resume devices that cannot stay in run-time suspend during system transitions; they must also prevent devices that are not allowed to wake up the system from doing so.
All of this looks kind of impressive, Wysocki said, but there are issues with it. At this point, he showed a photo of the Leaning Tower of Pisa, to the visible amusement of the audience. Fortunately, he said, the Linux kernel's suspend and resume code is safely far from collapsing.
One of the issues that is currently being tackled is related to asynchronous suspend and resume of devices during system transitions between the working state and sleep states.
Generally speaking, there are devices that can be handled out of order with respect to any other devices so long as all of their known dependencies are met; they are referred to as "async" devices. The other devices, referred to as "sync" devices, must be handled in a specific order that is assumed to cover all of the dependencies, the known ones as well as the unknown ones, if any. Of course, the known dependencies between the async and sync devices, represented through parent-child relationships or by device links, must be taken into account as well.
Each of the suspend and resume phases walks through all of the devices in the system, including both the async and sync devices, and the problem is how to arrange that walk. For instance, the handling of all async devices may be started at the beginning of each phase (this is the way device resume code works in the mainline kernel), but then the threads handling them may need to wait for the known dependencies to be met, and starting all of those threads at the same time may stress the system. The processing of async devices may also be started after handling all of the preceding sync devices (this is the way device suspend code works in the mainline kernel), but, in that case, starting the handling of some async devices earlier may speed up the transition. That will happen if there are async devices without any known dependencies, for example.
There are other possibilities, and the working consensus appears to be that the handling of an async device should be started when some known dependencies are met for it (or it has no known dependencies at all). The question that remains is whether or not to wait until all known dependencies are met for an async device before starting the handling of it.
Regardless of the way the ordering issue is resolved, the handling of the slowest async device tends to take the majority of the time spent in each suspend and resume phase. Consequently, if there are three devices, each of which happens to be the slowest one in a different suspend phase, combining all of the phases into one would reduce the total suspend time. Along these lines of reasoning, reducing the number of suspend and resume phases overall, or moving "slow" device handling to the phases where there is other slow work already, may cause suspend and resume to become faster.
Another area of possible improvement is the integration of system transitions between the working state and sleep states with the run-time PM of devices. This integration is needed because leaving run-time suspended devices in suspend during system transitions may both save energy and reduce the system suspend and resume duration. However, it is not always viable, and drivers need to be prepared for this optimization so, if they want devices to be left in suspend, they need to opt in for that.
Currently, there are three ways to do so:
- Participate in the so-called "direct-complete" optimization, causing the handling during a system suspend and resume cycle to be skipped for a device if it is run-time-suspended to start with. Hence the name; all suspend and resume phases except for "prepare" and "complete" are skipped for those devices, so effectively they go directly from the "prepare" to the "complete" phase.
- Set the DPM_FLAG_SMART_SUSPEND driver flag.
- Use pm_runtime_force_suspend() as a system suspend callback.
Unfortunately, the first option is used rarely, and the other two are not compatible with each other (drivers generally cannot do both of them at the same time). Moreover, some middle-layer code only works with one of them.
Even if the driver opts in to leave the device in suspend, the device may still have to be resumed because of the wakeup configuration. Namely, run-time PM enables wakeup signaling for all devices that support it, so that run-time suspended devices can signal a need to take care of some event coming from the outside of the system. The power-management subsystem wants to be transparent and it doesn't want to miss any signal that may require the user's attention.
On the other hand, only some of the wakeup-capable devices are allowed to wake up the whole system from sleep states, because there are cases in which the system needs to stay in a sleep state until the user specifically wants it to resume (for example, a laptop with a closed lid in a bag). For this reason, if a wakeup-capable device is run-time suspended prior to a system transition into a sleep state, and it is not allowed to wake up the system from sleep, it may need to be resumed and reconfigured during that transition. For some devices, the wakeup setting may be adjusted without resuming them, but that is not a general rule.
Apart from the above, there are dependencies on the platform firmware and on other devices that may require a given device to be resumed during a system transition into a sleep state. Usually, middle-layer code knows about those dependencies and it will act accordingly, but this means that drivers generally cannot decide by themselves what to do with the devices during those transitions and some cooperation between different parts of the code is required.
Leaving devices in suspend during a transition from a sleep state to the working state of the system may also be beneficial, but it is subject to analogous limitations.
Drivers that don't opt in for the direct-complete optimization may need to specifically opt in for leaving devices in suspend during system resume. If they use use pm_runtime_force_suspend() as a suspend callback, they also need to use use pm_runtime_force_resume() as a resume callback; this means that the device will be left in suspend unless it was in use prior to the preceding system suspend (that is, its run-time PM usage counter is nonzero or some of its children have been active at that time). If drivers set DPM_FLAG_SMART_SUSPEND, they also need to set DPM_FLAG_MAY_SKIP_RESUME to allow devices to be left in suspend.
However, if a given device is not allowed to wake up the system from sleep, and it cannot be reconfigured without resuming, leaving it in suspend is not an option. Also, if the platform firmware powers up devices during system resume before passing control to the kernel, it is more useful to resume all of them and leave the subsequent PM handling to run-time PM.
All of this needs to be carefully put in order. Different driver opt-in variants need to be made to work with each other and with all middle-layer code. Clear criteria for resuming run-time suspended devices during system transitions between the working state and sleep states need to be agreed on and documented, and all middle-layer code needs to adhere to them. In particular, device_may_wakeup() needs to be taken into account by all middle-layer code and in the absence of it, by device drivers and the PM core.
In addition to the above, it can be observed that for all devices with run-time PM enabled, run-time PM callbacks should always be suitable for resuming them during transitions from system suspend into the working state unless they are left in suspend. In principle, some significant simplifications of device handling during system resume may result from this observation, but again this will require quite a bit of work.
Sched_ext: current status, future plans, and what's missing
Speakers: Andrea Righi (video) and Joel Fernandes (video)
This talk covered the status of sched_ext: a technology that allows schedulers to be implemented as BPF programs that are loaded at run time. The core functionality of sched_ext is now maintained in the kernel (after the merge that happened in 6.12) and it's following the regular development workflow like any other subsystem.
Individual schedulers, libraries, and tooling are maintained in a separate repository. This structure was intentionally chosen to encourage fast experimentation within each scheduler. While changes still go through a review process, this separation allows a quicker development process. There is also a significant portion of this shared code base that is written in Rust, mostly topology abstractions and architectural properties that are accessible from user space and can be shared with the BPF code using BPF maps.
The community of users and developers keeps growing and the major Linux distributions are almost caught up with the kernel and packages for the main sched_ext schedulers.
An important question, raised by Juri Lelli, centered around the relationship with the kernel's completely fair scheduler (referred to here as "fair.c") and whether it's worthwhile to reuse some of its functionality to avoid code duplication. In fact, sched_ext, being implemented as a new scheduling class, includes its own implementation of a default scheduling policy. BPF-based schedulers can then override this default behavior by implementing specific callbacks. The default implementation in sched_ext could just reuse parts of fair.c where appropriate to minimize code duplication and allow users to build on a base that closely mirrors the kernel's default behavior.
However, reusing fair.c code is challenging due to its deep integration with various parts of the kernel scheduler. Features like energy and capacity awareness (EAS and CAS), which are not completely supported in sched_ext, complicate code reuse; introducing dependencies from sched_ext back into fair.c should be also avoided.
Given these challenges, the consensus for now is to keep sched_ext independent by reimplementing similar functionality within its core. In doing so, the goal is to remain as consistent as possible with fair.c, with the possibility of converging toward a shared code base in the future. This approach also presents an opportunity to revisit and possibly eliminate some legacy heuristics embedded in fair.c, making it a potentially beneficial process for everyone.
Another topic that was discussed is how to prevent starvation of SCHED_EXT tasks when a task running at a higher scheduling class is monopolizing a CPU. The proposed solution is to implement a deadline server, similar to the approach used to prevent starvation of SCHED_NORMAL tasks. This work is currently being handled by Joel Fernandes.
One of the sched_ext key features highlighted in the talk is its exit dump-trace functionality: when a scheduler encounters a critical error, the sched_ext core automatically unloads it, reverting to the default scheduler, and triggering the user-space scheduler program to emit a detailed trace containing diagnostic information. This mechanism also activates if a task is enqueued to a dispatch queue (a sched_ext run queue), but is not scheduled within a certain timeout, making it especially useful for detecting starvation scenarios.
Currently, there's no equivalent mechanism in fair.c to capture such traces. Thomas Gleixner suggested that we could achieve similar insights using tracepoints. Lelli added that, before the deadline server existed, the stalld daemon served a similar purpose: it monitored threads stuck in a run queue for too long without being scheduled, then temporarily boosted them using the SCHED_DEADLINE policy to grant them a small run-time slice. While the deadline server now can handle this in-kernel, stalld could still be used for its monitoring capabilities.
A potential integration with cpuidle was also discussed, Vincent Guittot pointed out that we can just use the cpuidle quality-of-service latency interface from user space, which is probably a reasonable solution, as it just involves some communication between BPF and user-space and there's really no need to add a new specific sched_ext API for that.
The talk also briefly touched the concept of tickless scheduling using sched_ext. A prototype scheduler (scx_tickless) exists; it routes all scheduling events to a designated subset of CPUs, while isolating the remaining CPUs. These isolated CPUs are managed to run a single task at a time with an effectively infinite time slice. If a context switch is needed, it is triggered via a BPF timer and handled by the manager CPUs using an inter-processor interrupt (allowing the scheduler to determine an arbitrary tick frequency, managed by the BPF timer). When combined with the nohz_full boot parameter, this approach enables the running of tasks on isolated CPUs with minimal noise from the kernel, which can be an appealing property for virtualization and high-performance workloads, where even small interruptions can impact performance.
That said, the general consensus from the audience was that the periodic tick typically introduces an overhead that is barely noticeable, so further testing and benchmarking will be necessary to validate the benefits of this approach.
Other upcoming features in sched_ext include the addition of richer topology abstractions within the core sched_ext subsystem and support for loading multiple sched_ext schedulers simultaneously in a hierarchical setup, integrated with cgroups.
What can EEVDF learn from a special-purpose scheduler? The case of scx_lavd
Speaker: Changwoo Min (video)
Min gave a talk on a gaming-focused, sched_ext-based scheduler, scx_lavd (which was also covered here in September 2024). The talk started with a quick overview of the scx_lavd scheduler and its goals. Scx_lavd is a virtual-deadline-based scheduler (like EEVDF) specialized for gaming workloads. This approach was chosen because a virtual deadline is a nice framework to express fairness and latency in a unified manner. Moreover, by sharing a common foundation, there could be opportunities for the two schedulers to share lessons learned and exchange ideas.
The technical goals of scx_lavd are achieving low tail latency (and thus high frame rates in gaming), lower power consumption, and smarter use of heterogeneous processors (like ARM big.LITTLE). He added that if scx_lavd achieves all three, it will be a better desktop scheduler, which is his stretch goal.
He clarified that the main target applications are unmodified Windows games running on the Proton/Wine layer, so it is hard to expect additional latency hints from the application. An audience member asked if Windows provides an interface specifying latency requirements. Min answered that it does, and if a game or a game engine provides the latency hints, such information can be handed down to the scx_lavd through the Proton/Wine layer.
Games are communication-intensive; 10-20 tasks are easily involved in finishing a single job (such as updating the display after a button press), and they communicate through primitives such as futexes, epoll, and NTSync. A scheduling delay among one of the tasks can cause cascading delay and latency (frame time) spikes.
The key question is how to determine which tasks are latency-critical. Min explained that a task in the middle of a task chain is latency-critical, so scx_lavd gives a shorter deadline to such a task, causing it to execute sooner. To decide whether a task is in the middle of a task chain, scx_lavd measures how frequently a task is blocked waiting for an event (blocking frequency) and how often a task wakes up another task (wakeup frequency). High blocking frequency means that the task usually serves as a consumer in a task chain, and high wakeup frequency indicates that the task frequently serves as a producer. Tasks with both high blocking and wakeup frequencies are in the middle of the chain somewhere.
Participants asked about memory consumption (potentially proportional to the square of the number of tasks), the time to reach the steady state, how to decay those frequencies, and the relationship to proxy execution. Min answered that it simply measures the frequencies without distinguishing individual wakers and wakees, so it is pretty cheap. Those frequencies are decayed using the standard exponential weighted moving average (EWMA) technique, converging very quickly (a few hundreds of milliseconds) in practice. Also, compared to proxy execution, which strictly tracks a lock holder and waiters, scx_lavd's approach is much looser in tracking task dependencies.
After explaining how scx_lavd identifies and boosts latency-critical tasks, Min showed a video demo, of a game that achieves high, stable frame rates while running a background job. That led to further discussion about scx_lavd's findings. Peter Zijlstra mentioned that the determination of latency-critical tasks is something that could be considered for the mainline scheduler, but breaking fairness is not.
Min moved on to how scx_lavd reduces power consumption. He is particularly interested in the system being under-utilized (say 20-30% CPU utilization) for running an old, casual game. He explained the idea of core compaction, which limits the number of actively used CPUs according to the system load, allowing inactive CPUs to stay longer in a deeper idle state and saving power. The relevance of EAS was discussed. Also, it was suggested that the core compaction needs to refer to the energy model for more accurate decisions on a broader variety of processors.
Reduce, reuse, recycle: propagating load-balancer statistics up the hierarchy
Speaker: Prateek Nayak Kumbla (video)
With growing core counts, the overhead of newidle balancing (load balancing performed when a CPU is about to enter the idle state) has become a scalability concern on large deployments. The past couple of years saw strategies such as ILB_UTIL and SHARED_RUNQ being proposed in the community to reduce the cost of idle balancing and to make it more efficient. This talk covered a new approach to optimize load balancing by reducing the cycles in its hottest function — update_sd_lb_stats().
The talk started by showing the benefits of newidle balancing by simply bypassing it; that made almost all the workloads tested unhappy. The frequency and the opportunistic nature of newidle balancing ensures that imbalances are checked frequently; as a result, the load is balanced opportunistically before the periodic balancer kicks in.
update_sd_lb_stats(), which is called at the beginning of every load-balancing attempt, iterates over all the groups of scheduling domain, calling update_sg_lb_stats() which, in turn, iterates over all the CPUs of the group and aggregates the load-balancing statistics. When iterating over multiple domains, which is regularly the case with newidle balancing, the statistics computed at a lower domain are never reused and are always computed over again, despite being done successively without any delay between them.
The new approach being proposed enables statistics reuse by propagating statistics aggregated at a lower domain when load balancing at a higher domain. This approach was originally designed to reduce the overheads of busy periodic balancing; Kumbla presented the pitfalls of using it for newidle balancing.
Using the data from perf sched stats with the sched-messaging benchmark as the workload, it was noted that aggressively reusing statistics without any invalidation can lead to newidle balancing converging on the groups that are no longer busy. The data also showed a dramatic reduction in newidle balancing cost, which was promising. Even with a naïve invalidation strategy, the regression in several workloads remained, which prompted further investigation. It was noted that the idle_cpu() check in the scheduler first checked if the current running task is the swapper task. Newidle balancing is done prior to a context switch, and a long time spent there can confuse the wakeup path by making the CPU appear busy. Kumbla noted that perhaps the ttwu_pending bit can be reused to signal all types of wakeups and remove the check for the swapper task from the idle_cpu() function.
Zijlstra noted that perhaps Guittot's push task mechanism can be used to redesign the idle and newidle balancing, and the statistics propagation can help reduce the overheads of busy-load balancing. Guittot mentioned an example implementation that uses a CPU mask to keep track of all the busy CPUs to pull from and idle CPUs to push tasks to. A prototype push approach was posted soon after OSPM as an RFC to flesh out the implementation details.
Zijlstra also noted that, during busy balancing, it is always the first CPU of the group that does the work for the domain, but perhaps that burden can be rotated among all the CPUs of the domain. There were some discussions on load-balancing intervals and how the statistics propagation would require aligning them for better efficiency. Kumbla noted that the prototype already contains a few tricks to align the intervals, but it could be further improved.
Fernandes questioned whether the statistics can be still considered valid if tasks were moved at a lower domain. It was noted that reusing statistics should be safe for busy-load balancing, since only the load or the utilization is migrated, and the aggregates of these statistics will remain the same even if tasks are moved at lower domains.
Julia Lawall asked if there have been any pathological cases where statistics propagation has backfired, to which Kumbla replied that the busy balancing is so infrequent compared to newidle balancing that it is very unlikely a single wrong decision will have any impact. Kumbla also requested for more testing to ensure that there are no loopholes in the logic.
The talk went on to discuss a yet another strategy to optimize newidle balancing that introduced a fast path based on tracing the busiest CPU in the lowest-level cache (LLC) domain and, first, trying to pull the load from this CPU. It was noted that, despite yielding some benefit at lower utilization, the fast path completely fails when there are multiple concurrent newidle balance operations running and the lock contention at the busiest CPU leads to diminishing returns.
The talk finished by discussing SIS_NODE which expanded the search space of wakeup beyond the LLC domain to the entire NUMA node. It was noted that, despite looking promising at lower utilization, SIS_NODE quickly fails at higher utilization where the overhead of the larger search space is evident when it fails to find an idle CPU. A guard like SIS_UTIL is required as a prerequisite to make it viable but its implementation remains a challenge, especially in face of bursty workloads and an ever-growing size of the node domain.
Hierarchical CBS with deadline servers
Speakers: Luca Abeni, Yuri Andriaccio (video)
This talk presented a new implementation of the hierarchical constant bandwidth server (HCBS), an extension of the constant bandwidth server that allows scheduling multiple independent, realtime applications through control groups, providing temporal isolation guarantees. HCBS will allow realtime applications inside control groups to be scheduled using the SCHED_FIFO and SCHED_RR scheduling policies.
In HCBS, control groups are scheduled through SCHED_DEADLINE, using the deadline-server mechanism. Each group is associated with a bandwidth reservation (over a specified period), which is distributed among all CPUs. Whenever a control group is deemed runnable, the scheduler is recursively invoked to pick the realtime task to schedule.
The proposed mechanism can be used for various purposes, such as having multiple independent realtime applications on the same machine, guaranteeing that they cannot interfere with each other, and providing access to realtime scheduling policies inside control groups, enforcing bandwidth reservation and control for those policies.
The proposed scheduler aims at replacing and improving upon the already implemented RT_GROUP_SCHED scheduler, reducing its invasiveness in the scheduler's code and addressing a number of problems:
- HCBS uses SCHED_DEADLINE and the deadline-server mechanism to enforce bandwidth allocations, thus removing all the custom code RT_GROUP_SCHED uses. The deferred behavior of the deadline server must not be used in HCBS, which is different from how deadline servers are used to enforce run time for SCHED_OTHER tasks.
- HCBS reuses the non-control-group code of the realtime scheduling classes to implement the local scheduler, with a few additional checks, to be as non-invasive as possible.
- The use of deadline servers solves the "deferrable server" issue of the RT_GROUP_SCHED scheduler.
- HCBS removes RT_GROUP_SCHED's run-time migration mechanism. Instead, it only performs task migration. HCBS migrates tasks from CPUs that have exhausted their run time to others that still have available time. This allows it to fully exploit the allocated bandwidth.
- The HCBS scheduler has strong theoretical foundations. If users allocate an appropriate budget (computed by using realtime analysis), then it will be possible to guarantee respect for the application's temporal constraints.
- It also performs admission controls to guarantee that it can effectively provide the requested bandwidth.
The current patchset is based on kernel version 6.13, but it is not complete yet. It passes most of the Linux Test Project tests and other custom-tailored stress tests. Tests with rt-app are consistent with realtime theory.
Arbitrary decisions on the implementation were discussed with the OSPM audience:
- The HCBS scheduler should only be available for the version-2 control group hierarchy.
- The bandwidth enforcement should not affect the root control group, to keep the current implementation of realtime policies.
- Tasks should only be allowed to run in leaf groups. Non-leaf control groups are only used to enforce partitioning of CPU time.
- Multi-CPU run-time allocation should follow the allowed CPU mask of the control group (cpuset.cpus file); disabled CPUs should not have run time allocated.
- The assignment of different run times for a given set of CPUs is currently done through the rt_multi_runtime_us knob, but reusing the standard rt_runtime_us knob has been suggested.
- Run-time migration of RT_GROUP_SCHED tasks has been removed to prevent over-commitment or CPU starvation. It has been suggested to look into solutions to perform such migration whenever possible to prevent unnecessary context switches.
As pointed out in the discussion, the scheduling mechanism may have counter-intuitive behaviors when over-committing: suppose a control group is allocated on two CPUs, each with 0.5 bandwidth usage, and two FIFO tasks are run, the first with priority 99 and usage of 0.8, the second with priority 50 and usage of 0.5, for a total usage of 1.3, over-committing the allocated bandwidth of 1.0. If the CPUs activate in parallel, both tasks will activate and will consume all the available bandwidth. The priority-50 task will use its requested bandwidth while the priority-99 task, even though it has higher priority, will consume only 0.5 out of the 0.8 usage. The result may also vary with a different distribution of the bandwidth on the same number of CPUs.
An expected behavior, instead, would be that higher priority tasks must have higher priority on the total CPU bandwidth; in this case, the priority-99 task should always consume its bandwidth. Since these situations arise only when over-committing, thus outside theoretical analysis, they should not pose a problem.
Brief items
Kernel development
Kernel release status
The 6.15 kernel is out, having been released on May 25. Linus noted:
So this was delayed by a couple of hours because of a last-minute bug report resulting in one new feature being disabled at the eleventh hour, but 6.15 is out there now.
Significant changes in 6.15 include smarter timer-ID assignment to make checkpoint/restore operations more reliable, the ability to read status information from a pidfd after the process in question has been reaped, the PIDFD_SELF special pidfd value, nested ID-mapped mounts, zero-copy network-data reception via io_uring, The ability to read epoll events via io_uring, resilient queued spinlocks for BPF programs, guard-page enhancements allowing them to be placed in file-backed memory areas and for user space to detect their presence, the once-controversial fwctl subsystem, the optional sealing of some system mappings, and much more.
See the LWN merge-window summaries (part 1, part 2) and the in-progress KernelNewbies 6.15 page for more information.
Stable updates: 6.14.8, 6.12.30, 6.6.92, 6.1.140, and 5.15.184 were released on May 22.
The 6.14.9 and 6.12.31 updates are in the review process; they are due on May 29.
Quote of the week
Nova Core is in the infamous position of being the first driver to have been merged with the upstream kernel Linux that is written in Rust and that loads blobs.— "Freedo" releases Linux-libre 6.15-gnuWe set out to clean it up, and we did, but... we don't speak Rust, so we've broken it in the process. Now, that's not so unconventional, is it? :-)
Distributions
AlmaLinux OS 10.0 released
Version 10 of the AlmaLinux OS distribution has been released.
The goal of AlmaLinux OS is to support our community, and AlmaLinux OS 10 is the best example of that yet. With an unwavering eye on maintaining compatibility with Red Hat Enterprise Linux (RHEL), we have made small improvements to AlmaLinux OS 10 that target specific sections of our userbase.
See the release notes for details.
Fedora Council overturns FESCo provenpackager decision
The Fedora Council has ruled on the Fedora Engineering Steering Council's (FESCo) decision last year to revoke Peter Robinson's provenpackager status. In a statement published to the fedora-devel-announce mailing list, the council has announced that it has overturned FESCo's decision:
FESCo didn't have a specific policy for dealing with a request to remove Proven Packager rights. In addition, the FESCo process was handled entirely in private. The contributor didn't receive a formal notification or warning from FESCo, and felt blindsided by the official decision when and how it was announced. The Fedora Council would like to extend our sincerest apology on behalf of the Fedora Project to them.
LWN covered the story in December 2024.
Launchpad mailing lists going away
Canonical's Launchpad software-collaboration platform that is used for Ubuntu development will be shutting down its hosted mailing lists at the end of October. The announcement recommends Discourse or Launchpad Answers as alternatives. Ubuntu's mailing lists are unaffected by the change.
NixOS 25.05 released
Version 25.05 of the NixOS distribution has been released. Changes include support for the COSMIC desktop environment (reviewed here in August), GNOME 48, a 6.12 kernel, and many new modules; see the release notes for details. (Thanks to Pavel Roskin).
Development
Home Assistant deprecates the "core" and "supervised" installation modes
Our recent article on Home Assistant observed that the project emphasizes installations using its own Linux distribution or within containers. The project has now made that emphasis rather stronger with this announcement of the deprecation of the "core" and "supervised" installation modes, which allowed Home Assistant to be installed as an ordinary application on a Linux system.
These are advanced installation methods, with only a small percentage of the community opting to use them. If you are using these methods, you can continue to do so (you can even continue to update your system), but in six months time, you will no longer be supported, which I'll explain the impacts of in the next section. References to these installation methods will be removed from our documentation after our next release (2025.6).
Support for 32-bit Arm and x86 architectures has also been deprecated.
Mozilla is shutting down Pocket
Mozilla has announced that it is shutting down Pocket, a bookmarking service acquired by Mozilla in 2017, this coming July. "Pocket has helped millions save articles and discover stories worth reading. But the way people use the web has evolved, so we're channeling our resources into projects that better match their browsing habits and online needs."
Development quotes of the week
To link this back to actual Unix history (or something much nearer that), I realized that `bullshit generator' was a reasonable summary of what LLMs do after also realizing that an LLM is pretty much just a much-fancier and better-automated descendant of Mark V Shaney: https://en.wikipedia.org/wiki/Mark_V._Shaney— Normal Wilson
My name is Rob Pike and I approve this message.— Rob Pike in reply to Wilson.
Page editor: Daroc Alden
Announcements
Newsletters
Distributions and system administration
Development
Meeting minutes
Calls for Presentations
CFP Deadlines: May 29, 2025 to July 28, 2025
The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.
Deadline | Event Dates | Event | Location |
---|---|---|---|
June 11 | August 16 August 17 |
Free and Open Source Software Conference | Sankt Augustin, Germany |
June 13 | September 30 October 1 |
All Systems Go! 2025 | Berlin, Germany |
June 13 | October 17 October 19 |
OpenInfra Summit Europe 2025 | Paris-Saclay, France |
June 15 | July 14 July 20 |
DebConf 2025 | Brest, France |
June 15 | November 7 November 8 |
Seattle GNU/Linux Conference | Seattle, US |
June 20 | August 29 August 31 |
openSUSE.Asia Summit | Faridabad, India |
June 30 | November 7 November 8 |
South Tyrol Free Software Conference | Bolzano, Italy |
If the CFP deadline for your event does not appear here, please tell us about it.
Upcoming Events
Events: May 29, 2025 to July 28, 2025
The following event listing is taken from the LWN.net Calendar.
Date(s) | Event | Location |
---|---|---|
June 5 June 8 |
Flock to Fedora 2025 | Prague, Czech Republic |
June 12 June 14 |
DevConf.CZ | Brno, Czech Republic |
June 13 June 15 |
SouthEast LinuxFest | Charlotte, NC, US |
June 15 June 17 |
Berlin Buzzwords | Berlin, Germany |
June 23 June 25 |
Open Source Summit North America | Denver, CO, US |
June 26 June 28 |
Linux Audio Conference | Lyon, France |
June 26 June 27 |
Linux Security Summit North America | Denver, CO, US |
June 26 June 28 |
openSUSE Conference | Nuremberg, Germany |
July 1 July 3 |
Pass the SALT Conference | Lille, France |
July 14 July 20 |
DebConf 2025 | Brest, France |
July 16 July 18 |
EuroPython | Prague, Czech Republic |
July 24 July 29 |
GUADEC 2025 | Brescia, Italy |
If your event does not appear here, please tell us about it.
Security updates
Alert summary May 22, 2025 to May 28, 2025
Dist. | ID | Release | Package | Date |
---|---|---|---|---|
AlmaLinux | ALSA-2025:7395 | 9 | 389-ds-base | 2025-05-26 |
AlmaLinux | ALSA-2025:7422 | 9 | ghostscript | 2025-05-26 |
AlmaLinux | ALSA-2025:7893 | 9 | grafana | 2025-05-26 |
AlmaLinux | ALSA-2025:8201 | 8 | gstreamer1-plugins-bad-free | 2025-05-27 |
AlmaLinux | ALSA-2025:8183 | 9 | gstreamer1-plugins-bad-free | 2025-05-27 |
AlmaLinux | ALSA-2025:8056 | 8 | kernel | 2025-05-21 |
AlmaLinux | ALSA-2025:8246 | 8 | kernel | 2025-05-28 |
AlmaLinux | ALSA-2025:7423 | 9 | kernel | 2025-05-26 |
AlmaLinux | ALSA-2025:7903 | 9 | kernel | 2025-05-26 |
AlmaLinux | ALSA-2025:8057 | 8 | kernel-rt | 2025-05-21 |
AlmaLinux | ALSA-2025:8132 | 8 | libsoup | 2025-05-26 |
AlmaLinux | ALSA-2025:8126 | 9 | libsoup | 2025-05-26 |
AlmaLinux | ALSA-2025:7425 | 9 | osbuild-composer | 2025-05-26 |
AlmaLinux | ALSA-2025:8136 | 9 | python-tornado | 2025-05-27 |
AlmaLinux | ALSA-2025:8046 | 8 | webkit2gtk3 | 2025-05-21 |
Arch Linux | ASA-202505-14 | bind | 2025-05-27 | |
Arch Linux | ASA-202505-13 | varnish | 2025-05-27 | |
Debian | DLA-4181-1 | LTS | glibc | 2025-05-27 |
Debian | DSA-5924-1 | stable | intel-microcode | 2025-05-23 |
Debian | DLA-4178-1 | LTS | kernel | 2025-05-25 |
Debian | DSA-5925-1 | stable | kernel | 2025-05-24 |
Debian | DLA-4179-1 | LTS | libavif | 2025-05-26 |
Debian | DLA-4177-1 | LTS | libphp-adodb | 2025-05-24 |
Debian | DLA-4176-1 | LTS | openssl | 2025-05-24 |
Debian | DLA-4180-1 | LTS | pgbouncer | 2025-05-27 |
Debian | DLA-4182-1 | LTS | syslog-ng | 2025-05-28 |
Fedora | FEDORA-2025-d62bbb5261 | F41 | dotnet8.0 | 2025-05-25 |
Fedora | FEDORA-2025-3f807ca531 | F42 | dotnet8.0 | 2025-05-25 |
Fedora | FEDORA-2025-75bda8d944 | F41 | dotnet9.0 | 2025-05-23 |
Fedora | FEDORA-2025-a54ca28d07 | F42 | dotnet9.0 | 2025-05-23 |
Fedora | FEDORA-2025-86022c9c44 | F42 | dropbear | 2025-05-23 |
Fedora | FEDORA-2025-d5e2376a90 | F41 | ghostscript | 2025-05-24 |
Fedora | FEDORA-2025-db5caba0cc | F42 | ghostscript | 2025-05-23 |
Fedora | FEDORA-2025-7e1b66f54e | F41 | iputils | 2025-05-24 |
Fedora | FEDORA-2025-abf317121e | F42 | microcode_ctl | 2025-05-28 |
Fedora | FEDORA-2025-b0f2570b61 | F41 | mozilla-ublock-origin | 2025-05-28 |
Fedora | FEDORA-2025-01794be9b3 | F42 | mozilla-ublock-origin | 2025-05-22 |
Fedora | FEDORA-2025-bc02ec32fb | F41 | nbdkit | 2025-05-26 |
Fedora | FEDORA-2025-8a2d82f65a | F42 | nbdkit | 2025-05-23 |
Fedora | FEDORA-2025-0c2b7a8f32 | F41 | nodejs20 | 2025-05-28 |
Fedora | FEDORA-2025-2936dece0e | F42 | nodejs20 | 2025-05-28 |
Fedora | FEDORA-2025-61ad6e65b3 | F41 | nodejs22 | 2025-05-28 |
Fedora | FEDORA-2025-f4cee58e97 | F42 | nodejs22 | 2025-05-28 |
Fedora | FEDORA-2025-a6305306dd | F41 | open-vm-tools | 2025-05-25 |
Fedora | FEDORA-2025-8896dcbcd0 | F41 | openssh | 2025-05-23 |
Fedora | FEDORA-2025-e5d435516f | F41 | python-watchfiles | 2025-05-23 |
Fedora | FEDORA-2025-e6c12e820e | F42 | python-watchfiles | 2025-05-23 |
Fedora | FEDORA-2025-f566d6a4ad | F41 | rpm-ostree | 2025-05-23 |
Fedora | FEDORA-2025-6a67917349 | F41 | sudo-rs | 2025-05-22 |
Fedora | FEDORA-2025-c62d1a4879 | F42 | sudo-rs | 2025-05-22 |
Fedora | FEDORA-2025-ee55907675 | F41 | thunderbird | 2025-05-24 |
Fedora | FEDORA-2025-32d6feec91 | F42 | thunderbird | 2025-05-25 |
Fedora | FEDORA-2025-510a78f439 | F41 | vyper | 2025-05-25 |
Fedora | FEDORA-2025-4acdb9a1bd | F42 | vyper | 2025-05-25 |
Fedora | FEDORA-2025-72469000ed | F41 | yelp | 2025-05-23 |
Fedora | FEDORA-2025-72469000ed | F41 | yelp-xsl | 2025-05-23 |
Fedora | FEDORA-2025-8365ba2261 | F41 | zsync | 2025-05-23 |
Fedora | FEDORA-2025-6f6043cb99 | F42 | zsync | 2025-05-23 |
Mageia | MGASA-2025-0159 | 9 | chromium-browser-stable | 2025-05-23 |
Mageia | MGASA-2025-0165 | 9 | firefox, nss, rootcerts | 2025-05-27 |
Mageia | MGASA-2025-0164 | 9 | glibc | 2025-05-25 |
Mageia | MGASA-2025-0163 | 9 | iputils | 2025-05-25 |
Mageia | MGASA-2025-0160 | 9 | microcode | 2025-05-23 |
Mageia | MGASA-2025-0161 | 9 | nodejs | 2025-05-25 |
Mageia | MGASA-2025-0166 | 9 | open-vm-tools | 2025-05-27 |
Mageia | MGASA-2025-0167 | 9 | sqlite3 | 2025-05-27 |
Mageia | MGASA-2025-0168 | 9 | thunderbird | 2025-05-27 |
Mageia | MGASA-2025-0162 | 9 | zsync | 2025-05-25 |
Oracle | ELSA-2025-7589 | OL8 | .NET 8.0 | 2025-05-21 |
Oracle | ELSA-2025-7598 | OL9 | .NET 8.0 | 2025-05-23 |
Oracle | ELSA-2025-7600 | OL9 | .NET 9.0 | 2025-05-23 |
Oracle | ELSA-2025-7395 | OL9 | 389-ds-base | 2025-05-23 |
Oracle | ELSA-2025-7437 | OL9 | avahi | 2025-05-23 |
Oracle | ELSA-2025-7389 | OL9 | buildah | 2025-05-23 |
Oracle | ELSA-2025-7895 | OL8 | compat-openssl10 | 2025-05-21 |
Oracle | ELSA-2025-7937 | OL9 | compat-openssl11 | 2025-05-23 |
Oracle | ELSA-2025-7444 | OL9 | expat | 2025-05-23 |
Oracle | ELSA-2025-8060 | OL8 | firefox | 2025-05-22 |
Oracle | ELSA-2025-7428 | OL9 | firefox | 2025-05-23 |
Oracle | ELSA-2025-8049 | OL9 | firefox | 2025-05-23 |
Oracle | ELSA-2025-7422 | OL9 | ghostscript | 2025-05-23 |
Oracle | ELSA-2025-7586 | OL9 | ghostscript | 2025-05-23 |
Oracle | ELSA-2025-7417 | OL9 | gimp | 2025-05-23 |
Oracle | ELSA-2025-7409 | OL9 | git | 2025-05-23 |
Oracle | ELSA-2025-7894 | OL8 | grafana | 2025-05-21 |
Oracle | ELSA-2025-7404 | OL9 | grafana | 2025-05-23 |
Oracle | ELSA-2025-7893 | OL9 | grafana | 2025-05-23 |
Oracle | ELSA-2025-8183 | OL9 | gstreamer1-plugins-bad-free | 2025-05-27 |
Oracle | ELSA-2025-7416 | OL9 | gvisor-tap-vsock | 2025-05-23 |
Oracle | ELSA-2025-8056 | OL8 | kernel | 2025-05-22 |
Oracle | ELSA-2025-7423 | OL9 | kernel | 2025-05-27 |
Oracle | ELSA-2025-7903 | OL9 | kernel | 2025-05-27 |
Oracle | ELSA-2025-7436 | OL9 | libsoup | 2025-05-23 |
Oracle | ELSA-2025-8126 | OL9 | libsoup | 2025-05-27 |
Oracle | ELSA-2025-7410 | OL9 | libxslt | 2025-05-23 |
Oracle | ELSA-2025-7419 | OL9 | mod_auth_openidc | 2025-05-23 |
Oracle | ELSA-2025-7402 | OL9 | nginx | 2025-05-23 |
Oracle | ELSA-2025-7426 | OL9 | nodejs:20 | 2025-05-23 |
Oracle | ELSA-2025-7433 | OL9 | nodejs:22 | 2025-05-27 |
Oracle | ELSA-2025-7967 | OL8 | osbuild-composer | 2025-05-21 |
Oracle | ELSA-2025-7425 | OL9 | osbuild-composer | 2025-05-23 |
Oracle | ELSA-2025-7431 | OL9 | php | 2025-05-27 |
Oracle | ELSA-2025-7432 | OL9 | php:8.2 | 2025-05-27 |
Oracle | ELSA-2025-7418 | OL9 | php:8.3 | 2025-05-27 |
Oracle | ELSA-2025-7391 | OL9 | podman | 2025-05-23 |
Oracle | ELSA-2025-8136 | OL9 | python-tornado | 2025-05-27 |
Oracle | ELSA-2025-7438 | OL9 | redis | 2025-05-27 |
Oracle | ELSA-2025-7686 | OL8 | redis:6 | 2025-05-21 |
Oracle | ELSA-2025-7429 | OL9 | redis:7 | 2025-05-27 |
Oracle | ELSA-2025-7539 | OL8 | ruby:2.5 | 2025-05-21 |
Oracle | ELSA-2025-7397 | OL9 | skopeo | 2025-05-23 |
Oracle | ELSA-2025-7435 | OL9 | thunderbird | 2025-05-23 |
Oracle | ELSA-2025-7440 | OL9 | vim | 2025-05-23 |
Oracle | ELSA-2025-8046 | OL8 | webkit2gtk3 | 2025-05-21 |
Oracle | ELSA-2025-7387 | OL9 | webkit2gtk3 | 2025-05-23 |
Oracle | ELSA-2025-7995 | OL9 | webkit2gtk3 | 2025-05-23 |
Oracle | ELSA-2025-7672 | OL9 | xdg-utils | 2025-05-23 |
Oracle | ELSA-2025-7427 | OL9 | xterm | 2025-05-23 |
Oracle | ELSA-2025-7430 | OL9 | yelp | 2025-05-23 |
Red Hat | RHSA-2025:8184-01 | EL10 | gstreamer1-plugins-bad-free | 2025-05-27 |
Red Hat | RHSA-2025:8201-01 | EL8 | gstreamer1-plugins-bad-free | 2025-05-27 |
Red Hat | RHSA-2025:8183-01 | EL9 | gstreamer1-plugins-bad-free | 2025-05-27 |
Red Hat | RHSA-2025:8137-01 | EL10 | kernel | 2025-05-26 |
Red Hat | RHSA-2025:8056-01 | EL8 | kernel | 2025-05-26 |
Red Hat | RHSA-2025:7901-01 | EL8.4 | kernel | 2025-05-26 |
Red Hat | RHSA-2025:7903-01 | EL9 | kernel | 2025-05-26 |
Red Hat | RHSA-2025:8142-01 | EL9 | kernel | 2025-05-26 |
Red Hat | RHSA-2025:7897-01 | EL9.0 | kernel | 2025-05-26 |
Red Hat | RHSA-2025:8133-01 | EL9.2 | kernel | 2025-05-26 |
Red Hat | RHSA-2025:8057-01 | EL8 | kernel-rt | 2025-05-26 |
Red Hat | RHSA-2025:7902-01 | EL8.4 | kernel-rt | 2025-05-26 |
Red Hat | RHSA-2025:7896-01 | EL9.0 | kernel-rt | 2025-05-26 |
Red Hat | RHSA-2025:7676-01 | EL9.2 | kernel-rt | 2025-05-26 |
Red Hat | RHSA-2025:8134-01 | EL9.2 | kernel-rt | 2025-05-26 |
Red Hat | RHSA-2025:8132-01 | EL8 | libsoup | 2025-05-26 |
Red Hat | RHSA-2025:8252-01 | EL8.8 | libsoup | 2025-05-28 |
Red Hat | RHSA-2025:8126-01 | EL9 | libsoup | 2025-05-26 |
Red Hat | RHSA-2025:8140-01 | EL9.2 | libsoup | 2025-05-26 |
Red Hat | RHSA-2025:8139-01 | EL9.4 | libsoup | 2025-05-26 |
Red Hat | RHSA-2025:8128-01 | EL10 | libsoup3 | 2025-05-26 |
Red Hat | RHSA-2025:8195-01 | EL8.8 | mingw-freetype and spice-client-win | 2025-05-27 |
Red Hat | RHSA-2025:7967-01 | EL8 | osbuild-composer | 2025-05-23 |
Red Hat | RHSA-2025:8075-01 | EL8.8 | osbuild-composer | 2025-05-23 |
Red Hat | RHSA-2025:8254-01 | EL8 | pcs | 2025-05-28 |
Red Hat | RHSA-2025:8256-01 | EL9 | pcs | 2025-05-28 |
Red Hat | RHSA-2025:8135-01 | EL10 | python-tornado | 2025-05-26 |
Red Hat | RHSA-2025:8136-01 | EL9 | python-tornado | 2025-05-26 |
Red Hat | RHSA-2025:8226-01 | EL9.2 | python-tornado | 2025-05-28 |
Red Hat | RHSA-2025:8223-01 | EL9.4 | python-tornado | 2025-05-28 |
Red Hat | RHSA-2025:8131-01 | EL10 | ruby | 2025-05-26 |
Red Hat | RHSA-2025:8046-01 | EL8 | webkit2gtk3 | 2025-05-27 |
Red Hat | RHSA-2025:7995-01 | EL9 | webkit2gtk3 | 2025-05-27 |
Slackware | SSA:2025-140-01 | aaa_glibc | 2025-05-20 | |
Slackware | SSA:2025-143-01 | ffmpeg | 2025-05-24 | |
Slackware | SSA:2025-140-02 | mozilla | 2025-05-20 | |
Slackware | SSA:2025-147-01 | mozilla | 2025-05-27 | |
SUSE | openSUSE-SU-2025:15150-1 | TW | audiofile | 2025-05-24 |
SUSE | openSUSE-SU-2025:15156-1 | TW | bind | 2025-05-27 |
SUSE | openSUSE-SU-2025:15143-1 | TW | chromedriver | 2025-05-22 |
SUSE | openSUSE-SU-2025:15132-1 | TW | dante | 2025-05-21 |
SUSE | openSUSE-SU-2025:15157-1 | TW | dnsdist | 2025-05-27 |
SUSE | SUSE-SU-2025:20328-1 | elemental-operator | 2025-05-28 | |
SUSE | SUSE-SU-2025:01710-1 | SLE12 | firefox | 2025-05-26 |
SUSE | SUSE-SU-2025:01701-1 | SLE15 SES7.1 oS15.6 | firefox | 2025-05-26 |
SUSE | openSUSE-SU-2025:15133-1 | TW | firefox-esr | 2025-05-21 |
SUSE | SUSE-SU-2025:01702-1 | SLE15 oS15.6 | glibc | 2025-05-26 |
SUSE | openSUSE-SU-2025:15134-1 | TW | gnuplot | 2025-05-21 |
SUSE | SUSE-SU-2025:01653-1 | SLE15 oS15.6 | govulncheck-vulndb | 2025-05-22 |
SUSE | SUSE-SU-2025:01713-1 | SLE15 oS15.6 | govulncheck-vulndb | 2025-05-27 |
SUSE | openSUSE-SU-2025:15135-1 | TW | govulncheck-vulndb | 2025-05-21 |
SUSE | openSUSE-SU-2025:15144-1 | TW | govulncheck-vulndb | 2025-05-23 |
SUSE | openSUSE-SU-2025:15159-1 | TW | govulncheck-vulndb | 2025-05-27 |
SUSE | openSUSE-SU-2025:15145-1 | TW | grafana | 2025-05-23 |
SUSE | openSUSE-SU-2025:15136-1 | TW | grype | 2025-05-21 |
SUSE | SUSE-SU-2025:01718-1 | SLE15 SES7.1 oS15.3 | gstreamer-plugins-bad | 2025-05-28 |
SUSE | SUSE-SU-2025:01717-1 | SLE15 oS15.5 | gstreamer-plugins-bad | 2025-05-28 |
SUSE | openSUSE-SU-2025:15160-1 | TW | jetty-annotations | 2025-05-27 |
SUSE | openSUSE-SU-2025:15161-1 | TW | jq | 2025-05-27 |
SUSE | SUSE-SU-2025:01707-1 | SLE15 oS15.6 | kernel | 2025-05-26 |
SUSE | openSUSE-SU-2025:15146-1 | TW | kind | 2025-05-23 |
SUSE | openSUSE-SU-2025:15147-1 | TW | kubo | 2025-05-23 |
SUSE | openSUSE-SU-2025:15151-1 | TW | libecpg6 | 2025-05-24 |
SUSE | openSUSE-SU-2025:15165-1 | TW | libnss_slurm2 | 2025-05-27 |
SUSE | openSUSE-SU-2025:15167-1 | TW | libyelp0 | 2025-05-27 |
SUSE | SUSE-SU-2025:01716-1 | SLE15 oS15.6 | mariadb | 2025-05-28 |
SUSE | SUSE-SU-2025:20327-1 | nvidia-open-driver-G06-signed | 2025-05-28 | |
SUSE | SUSE-SU-2025:20319-1 | nvidia-open-driver-G06-signed | 2025-05-28 | |
SUSE | SUSE-SU-2025:01658-1 | SLE-m5.1 SLE-m5.2 SLE-m5.3 SLE-m5.4 SLE-m5.5 oS15.3 | open-vm-tools | 2025-05-22 |
SUSE | SUSE-SU-2025:01705-1 | SLE15 SES7.1 | postgresql13 | 2025-05-26 |
SUSE | openSUSE-SU-2025:15137-1 | TW | postgresql13 | 2025-05-21 |
SUSE | SUSE-SU-2025:01654-1 | oS15.6 | postgresql13 | 2025-05-22 |
SUSE | SUSE-SU-2025:01661-2 | SLE15 | postgresql14 | 2025-05-26 |
SUSE | SUSE-SU-2025:01661-1 | SLE15 oS15.6 | postgresql14 | 2025-05-22 |
SUSE | openSUSE-SU-2025:15138-1 | TW | postgresql14 | 2025-05-21 |
SUSE | openSUSE-SU-2025:15139-1 | TW | postgresql15 | 2025-05-21 |
SUSE | openSUSE-SU-2025:15140-1 | TW | postgresql16 | 2025-05-21 |
SUSE | SUSE-SU-2025:01644-1 | SLE15 oS15.6 | postgresql17 | 2025-05-21 |
SUSE | openSUSE-SU-2025:15162-1 | TW | prometheus-blackbox_exporter | 2025-05-27 |
SUSE | SUSE-SU-2025:01523-1 | SLE15 | python-Django | 2025-05-26 |
SUSE | SUSE-SU-2025:01662-1 | SLE15 oS15.6 | python-cryptography | 2025-05-22 |
SUSE | SUSE-SU-2025:20330-1 | python-h11, python-httpcore | 2025-05-28 | |
SUSE | SUSE-SU-2025:01704-1 | MP4.3 SLE15 oS15.4 oS15.6 | python-setuptools | 2025-05-26 |
SUSE | SUSE-SU-2025:01695-1 | SLE12 | python-setuptools | 2025-05-23 |
SUSE | SUSE-SU-2025:01715-1 | SLE15 SLE-m5.1 SLE-m5.2 SES7.1 | python-setuptools | 2025-05-28 |
SUSE | SUSE-SU-2025:01649-2 | SLE15 | python-tornado6 | 2025-05-23 |
SUSE | SUSE-SU-2025:01649-1 | SLE15 oS15.4 oS15.6 | python-tornado6 | 2025-05-22 |
SUSE | SUSE-SU-2025:01709-1 | SLE15 oS15.4 oS15.6 | python310-setuptools | 2025-05-26 |
SUSE | openSUSE-SU-2025:15152-1 | TW | python311-Flask | 2025-05-24 |
SUSE | openSUSE-SU-2025:15153-1 | TW | python311-tornado6 | 2025-05-24 |
SUSE | openSUSE-SU-2025:15163-1 | TW | python312 | 2025-05-27 |
SUSE | openSUSE-SU-2025:15154-1 | TW | python313 | 2025-05-24 |
SUSE | openSUSE-SU-2025:15141-1 | TW | python314 | 2025-05-21 |
SUSE | SUSE-SU-2025:01693-1 | SLE12 | python36-setuptools | 2025-05-23 |
SUSE | SUSE-SU-2025:01723-1 | SLE15 SES7.1 oS15.3 oS15.6 | python39-setuptools | 2025-05-28 |
SUSE | openSUSE-SU-2025:15164-1 | TW | screen | 2025-05-27 |
SUSE | SUSE-SU-2025:20323-1 | sqlite3 | 2025-05-28 | |
SUSE | SUSE-SU-2025:01660-1 | SLE15 oS15.6 | thunderbird | 2025-05-22 |
SUSE | openSUSE-SU-2025:15131-1 | TW | thunderbird | 2025-05-21 |
SUSE | openSUSE-SU-2025:15149-1 | TW | thunderbird | 2025-05-24 |
SUSE | openSUSE-SU-2025:15155-1 | TW | transfig | 2025-05-24 |
SUSE | SUSE-SU-2025:01651-1 | MP4.3 SLE15 SLE-m5.1 SLE-m5.2 SLE-m5.3 SLE-m5.4 SLE-m5.5 SES7.1 oS15.6 | ucode-intel | 2025-05-22 |
SUSE | SUSE-SU-2025:01650-1 | SLE12 | ucode-intel | 2025-05-22 |
SUSE | openSUSE-SU-2025:15166-1 | TW | umoci | 2025-05-27 |
SUSE | SUSE-SU-2025:01724-1 | MP4.3 SLE15 oS15.4 | webkit2gtk3 | 2025-05-28 |
SUSE | SUSE-SU-2025:01720-1 | SLE12 | webkit2gtk3 | 2025-05-28 |
SUSE | SUSE-SU-2025:01703-1 | SLE15 oS15.6 | xen | 2025-05-26 |
SUSE | openSUSE-SU-2025:15142-1 | TW | xen | 2025-05-21 |
Ubuntu | USN-7525-1 | 18.04 20.04 22.04 24.04 | Tomcat | 2025-05-21 |
Ubuntu | USN-7525-2 | 24.04 24.10 25.04 | Tomcat | 2025-05-27 |
Ubuntu | USN-7526-1 | 24.10 25.04 | bind9 | 2025-05-21 |
Ubuntu | USN-7536-1 | 20.04 22.04 24.04 24.10 | cifs-utils | 2025-05-27 |
Ubuntu | USN-7534-1 | 25.04 | flask | 2025-05-26 |
Ubuntu | USN-7532-1 | 20.04 22.04 24.04 24.10 25.04 | glib2.0 | 2025-05-26 |
Ubuntu | USN-7541-1 | 18.04 20.04 22.04 | glibc | 2025-05-28 |
Ubuntu | USN-7535-1 | 16.04 18.04 20.04 22.04 24.04 24.10 25.04 | intel-microcode | 2025-05-27 |
Ubuntu | USN-7527-1 | 16.04 18.04 20.04 | libfcgi-perl | 2025-05-22 |
Ubuntu | USN-7510-7 | 20.04 22.04 | linux-aws, linux-intel-iotg-5.15, linux-nvidia-tegra-igx, linux-raspi | 2025-05-28 |
Ubuntu | USN-7521-2 | 24.10 | linux-aws | 2025-05-22 |
Ubuntu | USN-7510-6 | 22.04 | linux-aws-fips | 2025-05-27 |
Ubuntu | USN-7517-3 | 20.04 | linux-bluefield | 2025-05-26 |
Ubuntu | USN-7516-5 | 18.04 | linux-hwe-5.4 | 2025-05-23 |
Ubuntu | USN-7513-4 | 22.04 | linux-hwe-6.8 | 2025-05-28 |
Ubuntu | USN-7516-6 | 20.04 | linux-ibm | 2025-05-26 |
Ubuntu | USN-7517-2 | 18.04 | linux-ibm-5.4 | 2025-05-21 |
Ubuntu | USN-7521-3 | 24.04 24.10 | linux-lowlatency, linux-lowlatency-hwe-6.11, linux-oracle | 2025-05-28 |
Ubuntu | USN-7516-4 | 18.04 | linux-oracle-5.4 | 2025-05-21 |
Ubuntu | USN-7539-1 | 20.04 | linux-raspi | 2025-05-28 |
Ubuntu | USN-7524-1 | 24.04 | linux-raspi | 2025-05-26 |
Ubuntu | USN-7540-1 | 18.04 | linux-raspi-5.4 | 2025-05-28 |
Ubuntu | USN-7537-1 | 20.04 22.04 24.04 24.10 25.04 | net-tools | 2025-05-27 |
Ubuntu | USN-7533-1 | 24.10 25.04 | openjdk-17-crac | 2025-05-26 |
Ubuntu | USN-7531-1 | 24.10 25.04 | openjdk-21-crac | 2025-05-26 |
Ubuntu | USN-7520-2 | 25.04 | postgresql-17 | 2025-05-21 |
Ubuntu | USN-7280-2 | 14.04 16.04 18.04 20.04 22.04 24.10 | python | 2025-05-22 |
Ubuntu | USN-7528-1 | 20.04 22.04 24.04 24.10 25.04 | sqlite3 | 2025-05-22 |
Ubuntu | USN-7529-1 | 20.04 22.04 | tika | 2025-05-26 |
Kernel patches of interest
Kernel releases
Architecture-specific
Build system
Core kernel
Development tools
Device drivers
Device-driver infrastructure
Documentation
Filesystems and block layer
Memory management
Networking
Security-related
Virtualization and containers
Miscellaneous
Page editor: Joe Brockmeier