The sudden growth of the artificial intelligence (A.I.) industry has prompted a flurry of concerns: Will A.I. make human workers obsolete? Will models start repeating each other, bringing down the quality of the internet?
But there’s another significant concern that might quickly outweigh the others. A.I. tools create data — lots of data. NVIDIA plans on shipping 1.5 million A.I. server units per year by 2027, which would consume 85.4 terawatt-hours of electricity annually with current technologies. That’s significantly more than the total electrical consumption of countries like Austria, Switzerland, and Peru.
And those data centers will need to store that data somewhere. If A.I. data is accessed regularly, it will end up on hard drives — and there may not be enough hard drives to meet the demand.
Hard drive manufacturers will need to scale operations to address A.I. data.
An analysis from Everypixel Journal puts the size of the A.I. data footprint into perspective:
- In 2023, A.I. tools created about 15.47 billion images — more than photographers have taken in the last 150 years.
- People create about 34 million images per day.
- A.I. data usage isn’t limited to asset creation. ChatGPT’s website receives nearly 1.5 billion visitors per month, with each user spending about 7 minutes and 36 seconds on the website per visit.
Unfortunately, the data storage industry isn’t positioned to handle the sudden (and potentially exponential) increase. In 2022, researchers at Aston University predicted a “data storage crunch” within years — which could lead to reduced internet speeds for global users, with potentially dire economic consequences.
New technologies could help to address A.I.’s storage consumption.
There’s some good news: The vast majority of A.I.-generated data will be cold — after its creation, it won’t be accessed regularly (and we’d expect that most data will never need to be accessed again).
That makes data tapes a viable solution. LTO (Linear Tape Open) 9 can store 45 terabytes per cartridge (compressed), and the LTO project’s LTFS file system allows for fast read access when necessary.
Of course, data centers will probably have issues separating A.I.-generated data from human-generated data — so HDD-based storage will still be necessary at the enterprise level for the foreseeable future. While modern tape formats are fast, they’re not comparable to HDD-based RAID arrays.
Two major hard drive technologies can extend areal densities without sacrificing reliability:
- Heat-Assisted Magnetic Recording (HAMR), which uses heat to complement magnetization.
- Shingled Magnetic Recording (SMR), which overlaps hard drive tracks slightly resulting in narrower tracks (and greater storage capacities).
Apart from technology improvements, data centers can address limited storage media with more efficient data-handling methods. Hard drive manufacturers can shore up their supply chains to improve logistics (and limit the environmental effects of suddenly massive media production).
At Datarecovery.com, we’ve seen the storage industry find novel ways to adapt to new challenges. We’ve also invested in our own research and development to provide reliable services for enterprises — including RAID data recovery, ransomware recovery, and penetration (PEN) testing.
To learn more, call 1-800-237-4200 to speak with a member of our team or submit a request online.