“If ‘some’ is good, ‘more’ must be better,” goes the old adage – and in many cases, that’s true. But not when it comes to data. Keeping data too long can be messy and dangerous. Organizations that let data hang around too long are opening themselves up to lawsuits and regulatory actions, which is a key reason to delete data that is no longer needed.
Cost savings and more efficient operations are an added bonus to smart retention policies. A study by RingCentral shows that employees lose as many as 32 business days a year just moving between applications, folders, windows, and databases to gather the information they need. That’s a tremendous waste of time and resources – but what choice do organizations have?
Determining what should be deleted is a major challenge, given the plethora of data that has piled up in many organizations. Data is stored away in repositories, hard drives, databases, and web storage areas – but the organization is responsible for all of it, and all of it is fair game for lawyers and regulators seeking to highlight as evidence of legal misbehavior or oversight. Organizations need to develop a methodology for determining what they should keep and what they should delete, both to save time for employees, and to protect themselves from legal action.
Read More: Topic Detection: What Is It All About?
The key to that is getting control of data – discovering where it is and what it is all about. Classifying data by topic, customer, organization, product, or whatever other criteria an organization uses can help ensure that control. The question, then, is how to discover those criteria – to determine which ones are most important to the organization, thus clarifying whether data needs to be retained or disposed of.
One way to do that is by setting up a Machine Learning system using training sets that consist of what power users have kept or thrown out in the past. Machine Learning-based systems can peruse data that exist in repositories throughout the organization, examining the characteristics of data and how relevant it is to projects that are being worked on or clients that currently do business with the company. Teams have reams of documents, files, spreadsheets, etc. about projects and clients – and when they begin throwing out the ones they don’t need, the AI system learns based on those actions which criteria are most important and relevant.
Using that model, the system can present to teams the files that are prime candidates for deletion, with all the relevant data gathered into a single panel or screen – thus augmenting the work of employees by discovering potentially unneeded data on which they need to take action. Based on their needs and company policy, teams can then decide if a file can be safely deleted. And because the Machine Learning-based system improves with each round of data examination, it will increase its accuracy over time, allowing teams to further increase their efficiency and ensure they are keeping only the data they need.
AI systems can indeed help organizations get control of their data – ensuring that they know what it is, where it is, and why they need it. And if they don’t need it, those files and documents can safely be disposed. Accomplishing this manually, of course, would be a Sisyphean task, requiring a huge Business Intelligence team to sort through data. But with a Machine Learning-based system augmenting the work of teams who make the final decision on what needs to be deleted, organizations can get control of their data and ensure that they are prepared for any eventuality – whether it’s an attorney doing discovery, a regulator seeking information, or making work more efficient for staff who will have fewer data to search through.