IT backup & Co.& - You ask, Empalis answers: Our experts explain IT backup technical concepts and terms from a consultant's perspective.

A backup (data protection) means being able to return to a previous state of the data.

One needs a backup in case of data loss, tampering or lack of access to the original data.


The implementation of a backup environment depends, among other things, on the requirements for retention time, number of copies (also remote) and restore speed. The environment to be backed up must also be considered to determine the best backup tools.

Microsoft365 only stores data for a limited time of 93 days for SharePoint and OneDrive, plus another 14 days as a backup in the default for SharePoint. However, if the recycle bin becomes too large, it will also start deleting data.


For Mail, the default is 14 days retention, extendable to 30 days.


To store data securely, connectable solutions are necessary. Here there are many possible solutions, the selection of which depends on your needs and requirements.





Recovery is the process of retrieving data from a previously created backup.


There are many reasons for this, be it from accidental or intentional alteration, overwriting or deletion of the data, hardware failure, errors in the software, virus attack, disasters, etc. In the process, the data is written back to the original or a new storage location.


Thus, the backup / data protection is to protect against losses. However, the backup itself can also be affected by the previously mentioned problems. The backup runs on hardware, software is used for the backup, data may have been changed for a long time, overwritten, or infected by viruses.


The danger of hardware failures or regional disasters can be mitigated by remote backup copies. But whether backups are restorable as expected and desired is usually only determined in an emergency. Regular restore tests and scans of the backed-up data can help to find out in advance.


Ideally, all backed up data should be tested, since restoring individual data or systems usually works well, but does not prove that the entire environment can be restored. Due to the amount of data, these tests should also be automated.

RTO stands for "Recovery Time Objective" and refers to the amount of time an application, system or process is allowed to be down - without causing significant damage to the business. This includes recovery time.


Example: If an RTO is three hours, a lot of money must be paid: for failure data centers, automation, telecommunication, etc. since a complete recovery is to be achieved in three hours. If the RTO is specified as three weeks, there is significantly more time to acquire new resources and perform recovery.


RPO stands for "Recovery Point Objective" and refers to the maximum amount of data loss an organization can sustain between a backup and a system failure.


Example: Can you tolerate losing an hour of your work, four hours of your work, or even days and weeks from a document?


This determination is very important, because it allows you to define the interval of a backup. If the RPO is six hours, then a backup must run every six hours. If the backup were to run only every 12 hours, the risk of losing too much data would be too high. However, if it ran every hour, the cost of resources (storage, etc.) would be too high.

The 3-2-1 rule is the "golden rule" of backup solutions. It breaks down as follows:

  • Use of 3 copies of data (original and 2 backups).
  • On at least 2 types of storage (HDD / removable media / cloud).
  • With 1 offsite copy (in an external location / tape backup).

The idea behind the concept: exclusion of the so-called Single Point of Failure (SPOF) - a risk in the design or implementation - to thus prevent that e.g. a malware attack would be able to cripple the entire system or to ensure that in case of an environmental disaster, such as a flood, all data is in a safe place.


In operation, the 3-2-1 concept may look like this:


On a master backup a copy is created by the backup software. This copy is now replicated in 2 places: a) To a 2nd server (e.g., in a redundant server room) for fast operational access and b) To another data medium (e.g., cloud).

The 3rd copy is made on tape and outsourced to another location.


Here it should also be considered that the topic of cyber-attacks has become very acute over the last few years and in the case of a classic backup at only one location, one runs the risk of losing all data. A current example is Log4j. Therefore, like other experts, we consider the 3-2-1 concept to be indispensable for modern data backup under today's requirements.

The Disaster Recovery Plan should ideally be a document that includes what should be done, when and how, when a disaster occurs. The plan should define what actions are needed for recovery.


Let's say your company location was destroyed to the ground by fire. What would you do now?


So now you need to get new hardware and rebuild the backup server. After that, all the backup data is restored to the backup server, so that you can later restore the clients from it to be able to work productively again.


But how do I restore the backup server?

There is a DR plan file for that. For example, a Disaster Recovery plan file for IBM Spectrum Protect defines how the TSM server was built and configured, where which backup data was stored, and e.g., which LTO tapes are needed to restore all complete backups.

Backup is only half the battle: Checking backups with analysis and monitoring tools is well known. Some software manufacturers offer so-called "health checks" of the backups. This ensures that the backups are valid and usable.


However, a successful backup of data is of no use to anyone if the restore fails. According to the recent Veeam study "Data Protection Trends Report 2022", the organizations and companies surveyed were only able to restore their data 64% of the time. This means that more than 1/3 of the data could not be restored. Partly because unnoticed errors already happened during the backup, or the recovery did not work. Therefore, it is not only reasonable to perform a regular restore test, but rather a duty.


Just as there is a structured backup strategy, recovery must also be established as a process in the backup concept. A recovery plan with regular restore tests applies to all backups.

There is no such thing as "the" protection against ransomware. An improvement of the protection is achieved by several different starting points:


The 3-2-1 rule serves as a basis, i.e., 3 copy at 2 different locations, 1 of which is offsite. This can also be extended, e.g., Veeam has made it a 3-2-1-1-0 rule. Here, 1 immutable backup (a data set that cannot be changed) and 0 error restore (means regular restore tests) are added. But this alone is only a bundle of measures to protect against ransomware.

Protection of the backup environment also includes protection of the backup environment, i.e., separation of services and access to different accounts (just don't run everything under the admin user for convenience), as many attacks prefer to focus on Windows environments, run the backup servers under Linux, use network connections to the nodes only when needed and don't leave them open permanently. 


Thinking that a successful backup would be sufficient is too short-sighted at this point. Only a well thought-out and structured backup in conjunction with considerations in network security and user-specific access rights to data and services, plus the regular checking of backups and testing of restores, will lead here to the goal of protection against ransomware attacks.


Conclusion: Only a coherent concept offers protection!

"Restricted" business continuity - even in an emergency.


In the event of an attack, backup data can be used to provide emergency production (e.g., ReadOnly mode):

  • Emergency production in a "safe" area (emergency data center / cloud, etc.)
  • Provide critical information (verified / reviewed / secure versions, etc.). Not quite up to date but cannot cause further damage etc.

Preparation/training only works with backup data or a disaster recovery solution for case X.


Today it is no longer a question of if it will happen, but when it will happen: Only those who expect the "real" disaster recovery case (worst case scenario) have a chance.


With appropriate exercises of emergency procedures (DR test), the effects RTO and RPO can be minimized in the disaster case. However, DR procedures can only really be tested with backup data - with productive data this is only possible to a very limited extent. Moreover, in the "worst case scenario", only the backup data is still available.


Threat detection / forensics


 Threat detection requires that backup data is analyzed and scanned for "dormant" threats. Even "older" backup versions can be checked using the latest procedures. It is important to evaluate: Could the attack not be detected - or could early detection of long-term planned attacks successfully prevent attacks?

Help and support in forensics (when / what / where etc.) provide important lessons learned.


And after the attack...?


After an attack, try to restore data/systems with as little RTO and RPO as possible - and set up production with as few negative surprises as possible.

The Recovery Point Objective (RPO) is the key figure for expressing the tolerable loss of data. The more critical an application is, the lower the RPO should be.

To keep this value low, data must be backed up more frequently. For example, by backing up data 4 times a day, the RPO value can be reduced from a maximum of 24 hours (for a daily backup) to a maximum of 6 hours.


Depending on the workload, various technologies are available here:


In the file environment, versioning of changes and snapshot procedures are used to minimize data loss. These snapshots can also be moved to independent standby systems or to a backup system.

For database applications, low RPO values are achieved through close-meshed (e.g. every 10 minutes) log backups. Databases can also be recovered by rolling forward the available logs, usually to the second, to the point of data corruption.


Clusters and replication mechanisms make databases highly available and provide additional protection against failure.

For servers and virtual machines, it is possible to implement a continuous data protection solution that sends every write access to the hard disks to an independent copy. In addition, clusters and replication systems can provide independent copies, which can also be protected with backup systems.

For a backup solution to be ready for future workloads, it is very important that the solution is scalable. Along with proper sizing, you are prepared for the future.


Using different information from the environment, such as the number of servers to be backed up, whether virtual or physical, as well as the expected annual data growth (Annual Growth Rate), it is possible to estimate which storage and backup server capacities are required.


Of course, the number of copies to be kept, the retention period and the use of deduplication also play a role here.

Performance is a critical factor for the efficiency of a data protection environment. Many factors have an influence on the performance consideration and initially some questions should be answered:

  • Which values are achievable at all due to the hardware used?
    -> Performance bottlenecks are often caused by a single component.
  • Is the environment optimized for backup or restore performance?
    -> Deduplication and compression are beneficial for backup performance, but possibly negative for restore speed.
  • Are all backup versions equally weighted from a performance point of view?
    -> Tiering older backup versions to cheaper storage will reduce restore performance.
  • Is the environment prepared for mass restore requests?
    -> It is a big difference if only one single file or one server must be restored at the same time or if 100 servers have to be restored in parallel.

Sizing tools and vendor blueprints provide a good basis to map requirements and justify performance expectations.

Monitoring and reporting of the backup environment also helps to identify performance bottlenecks and trends.

Furthermore, regular restore tests can help validate the performance of the data protection environment and identify bottlenecks.

Protecting a backup environment against attacks requires multi-layered measures. It starts with "hardening" your environment accordingly:


All unnecessary services must be disabled, users should be limited to the necessary people, and accounts should be local and not come through AD (Active Directory), etc.

Hardening measures should be taken on the operating system side and in the backup application. For example, password rules apply in the operating system and in Spectrum Protect. Multi-factor authentication can be set up for access to the environment as another layer of security.

This has already created a hurdle for attackers, which of course is not insurmountable.


Therefore, the backup should also be designed according to the 3-2-1 rule. There should be at least 3 copies of the data distributed on at least two different media. Of these, at least one version is at a different location. Ideally, the media used also provides an air gap that does not allow these attackers to modify the data. Documentation for recovery should be available offline. Necessary passwords should also be stored securely in the vault so that recovery is successful even without access to internal password safes and information that was stored on internal servers.


A restore of the environment should also be tested regularly, since in the event of a disaster one can fall back on what has been rehearsed and does not start from scratch. For Spectrum Protect, the database backup and the associated information should also be in at least two locations and at least once on a medium that cannot be changed by attackers (tape).


Destructive commands should be released according to a 4-eyes principle and all administrative operations that influence the recoverability of the backup environment should be monitored.

This makes it more difficult for attackers to intervene or to detect them at an early stage.

When storing and backing up data, it is always important to understand what type of data is involved. Only then can you tell if the current strategy still makes sense or if you should adjust it:

  • Is the data accessed often?
  • Reading and / or writing?
  • Is special protection necessary for this data or is one-time storage sufficient?
  • Is the data moved little, is it perhaps even archive data?
  • Are the data volumes very large, making hard disk storage too expensive in the long run?

Once this understanding has been gained, either through analysis or simply through experience, the landmark decisions can be made. When it comes to backup, for example, it is still advisable to consider using tape. Tapes can be swapped out of the tape library, then stored "cold in the closet" and do not cause any ongoing costs.


However, not only the storage of data causes costs, but also its processing, i.e.: infrastructure.

The type of disk storage on which data is stored has a significant impact on costs. It is important to find a healthy balance between performance, availability, and costs. For example, it does not always make sense to store backup data on block storage with SAN infrastructure. A redundantly designed NAS may provide the required performance for a fraction of the cost, especially if you look at its deduplication capabilities.


Finally, the software used offers a great potential for savings:

  • Are contracts perhaps still running with large software houses that can be obtained more cheaply from smaller vendors?
  • Is the software itself still suitable for the task or are there now other providers who can offer the same performance for less money?

This point should be evaluated regularly and discussed with the trusted IT partner. The range of high-performance solutions has become so extensive that it is worthwhile to weigh them up regularly.

This question can be answered in two sentences:


Immutable storage replaces the air gap in backup when no tape storage is available.

This prevents backed-up data from being changed later, for example by a ransomware attack.

Cyber resilience in the backup environment refers to how well an organization can respond to cyber-attacks or disruptions and quickly recover its systems, data, and business processes. It involves ensuring that the backup system is robust and resilient enough to provide protection against threats such as ransomware, malware, phishing attacks or other cyberattacks.

An organization with a high level of cyber resilience in the backup environment will typically have implemented a combination of technologies, processes, and training to protect its data and systems from cyber-attacks while having the ability to quickly respond to and recover from disruptions. These include:

  • Regular backups stored on different storage media and locations to prevent data loss.
  • The implementation of encryption and authentication technologies to protect data from unauthorized access.
  • Fast and effective incident response management to quickly detect, report, and respond to cyber-attacks.
  • Training for employees to increase cybersecurity awareness and reduce the risk of attacks.

High cyber resilience in the backup environment can ensure an organization is able to maintain and quickly recover its business processes in the event of cyber-attacks or other disruptions.