If Exadata Storage Server is to be shut down when one or more databases are running, then you must verify that taking Exadata Storage Server offline will not impact Oracle ASM disk group and database availability. The ability to take Exadata Storage Server offline without affecting database availability depends on the level of Oracle ASM redundancy used on the affected disk groups. Availability also depends on the current status of disks in other Exadata Storage Servers that have mirror copies of data for the Exadata Storage Server that you are taking offline.
The preceding command checks if any disks are offline, in predictive failure status, or need to be copied to its mirror. If Oracle ASM redundancy is intact, then the command takes the grid disks offline in Oracle ASM, and then stops the cell services. If the following error is displayed, then it may not be safe to stop the cell services because a disk group may be forced to dismount due to redundancy.
Partition Bad Disk 3.4 28
Rebalance operations from multiple disk groups can be done on different Oracle ASM instances in the same cluster if the physical disk being replaced contains Oracle ASM disks from multiple disk groups. One Oracle ASM instance can run one rebalance operation at a time. If all Oracle ASM instances are busy, then rebalance operations are queued.
Each XT storage server includes twelve high-capacity disk drives. However, to achieve a lower cost, flash devices are not included. Also, Oracle Exadata System Software licensing is optional, with some software features disabled unless licensed. Hybrid Columnar Compression is included without requiring Oracle Exadata System Software licenses.
An Oracle ASM disk group should use storage provided by only one type of storage server (HC, EF, or XT). After adding the XT storage servers to your rack, create new disk groups to use the storage. The default disk group name for XT storage servers is XTND. However, you can use a different name as required.
XT storage servers provide fully integrated storage for Oracle Database. You can use the new disk group with database features such as Oracle Partitioning, Oracle Automatic Data Optimization, and Oracle Advanced Compression.
Every Oracle Exadata Storage Server in Oracle Exadata Rack has a system area, which is where the Oracle Exadata System Software system software resides. In Exadata Database Machine X7 and later systems, two internal M.2 devices contain the system area. In all other systems, the first two disks of Oracle Exadata Storage Server are system disks and the portions on these system disks are referred to as the system area.
In Exadata Database Machine X7 and later systems, all the hard disks in the cell are data disks. In systems prior to Exadata Database Machine X7, the non-system area of the system disks, referred to as data partitions, is used for normal data storage. All other disks in the cell are data disks.
Starting in Oracle Exadata System Software release 11.2.3.2.0, if there is a disk failure, then Oracle Exadata System Software sends an alert stating that the disk can be replaced, and, after all data has been rebalanced out from that disk, turns on the blue OK to Remove LED for the hard disk with predictive failure. In Oracle Exadata System Software releases earlier than 11.2.3.2.0, the amber Fault-Service Required LED was turned on for a hard disk with predictive failure, but not the blue LED. In these cases, it is necessary to manually check if all data has been rebalanced out from the disk before proceeding with disk replacement.
For a hard disk that has failed, both the blue OK to Remove LED and the amber Fault-Service Required LED are turned on for the drive indicating that disk replacement can proceed. The behavior is the same in all releases. The drive LED light is a solid light in Oracle Exadata System Software releases 11.2.3.2.0 and later; the drive LED blinks in earlier releases.
For example, a hard disk status equal to failed (the status for failed hard disks was critical in earlier releases), or warning - predictive failure is probably having problems and needs to be replaced. The disk firmware maintains the error counters, and marks a drive with Predictive Failure when internal thresholds are exceeded. The drive, not the cell software, determines if it needs replacement.
When disk I/O errors occur, Oracle ASM performs bad extent repair for read errors due to media errors. The disks will stay online, and no alerts are sent. When Oracle ASM gets a read error on a physically-addressed metadata block, it does not have mirroring for the blocks, and takes the disk offline. Oracle ASM then drops the disk using the FORCE option.
The hard disk controller on each Oracle Exadata Storage Server periodically performs a discharge and charge of the controller battery. During the operation, the write cache policy changes from write-back caching to write-through caching.
A hard disk outage can cause a reduction in performance and data redundancy. Therefore, the disk should be replaced with a new disk as soon as possible. When the disk fails, the Oracle ASM disks associated with the grid disks on the hard disk are automatically dropped with the FORCE option, and an Oracle ASM rebalance follows to restore the data redundancy.
An Exadata alert is generated when a disk fails. The alert includes specific instructions for replacing the disk. If you have configured the system for alert notifications, then the alert is sent by e-mail to the designated address.
After the hard disk is replaced, the grid disks and cell disks that existed on the previous disk in that slot are re-created on the new hard disk. If those grid disks were part of an Oracle ASM group, then they are added back to the disk group, and the data is rebalanced on them, based on the disk group redundancy and ASM_POWER_LIMIT parameter.
The predictive failure status indicates that the hard disk will soon fail, and should be replaced at the earliest opportunity. The Oracle ASM disks associated with the grid disks on the hard drive are automatically dropped, and an Oracle ASM rebalance relocates the data from the predictively failed disk to other disks.
An alert is sent when the disk is removed. After replacing the hard disk, the grid disks and cell disks that existed on the previous disk in the slot are re-created on the new hard disk. If those grid disks were part of an Oracle ASM disk group, then they are added back to the disk group, and the data is rebalanced based on disk group redundancy and the ASM_POWER_LIMIT parameter.
On all systems prior to Oracle Exadata Database Machine X7, the disks in the first two slots are system disks which store the operating system and Oracle Exadata System Software. One system disk must be in working condition to keep up the server.
Starting with Oracle Exadata System Software release 11.2.3.2, an underperforming disk is automatically identified and removed from active configuration. Oracle Exadata Database Machine then runs a set of performance tests. When poor disk performance is detected by CELLSRV, the cell disk status changes to normal - confinedOnline, and the hard disk status changes to warning - confinedOnline.
If the disk problem is temporary and passes the tests, then it is brought back into the configuration. If the disk does not pass the tests, then it is marked as poor performance, and Oracle Auto Service Request (ASR) submits a service request to replace the disk. If possible, Oracle ASM takes the grid disks offline for testing. If Oracle ASM cannot take the disks offline, then the cell disk status stays at normal - confinedOnline until the disks can be taken offline safely.
When a hard disk is replaced, the disk must be acknowledged by the RAID controller before it can be used. The acknowledgement does not take long, but use the LIST PHYSICALDISK command to ensure the status is NORMAL.
Exadata Storage software has a complete set of automated operations for hard disk maintenance, when a hard disk has failed or has been flagged as a problematic disk. But there are situations where a hard disk has to be removed proactively from the configuration.
In the CellCLI ALTER PHYSICALDISK command, the DROP FOR REPLACEMENT option checks if a normal functioning hard disk can be removed safely without the risk of data loss. However, after the execution of the command, the grid disks on the hard disk are inactivated on the storage cell and set to offline in the Oracle ASM disk groups.
Refer to "Shutting Down Exadata Storage Server". Make sure the Oracle ASM disk_repair_time attribute is set to a sufficiently large enough value so Oracle ASM does not drop the disks before the grid disks can be activated in another Exadata Storage Server.
If a flash card fails while in write-back mode, then Oracle Exadata System Software determines the data in the flash cache by reading the data from the surviving mirror. The data is then written to the cell that had the failed flash card. The location of the data lost in the failed flash cache is saved by Oracle Exadata System Software at the time of the flash failure. Resilvering then starts by replacing the lost data with the mirrored copy. During resilvering, the grid disk status is ACTIVE -- RESILVERING WORKING. If the PMEM cache is in write-through mode, then the data in the failed PMEM device is already present on the data grid disk, so there is no need for resilvering.
Starting with Oracle Exadata Database Machine X7, the flash devices are hot-pluggable on the Oracle Exadata Storage Servers. When performing a hot-pluggable replacement of a flash device on Oracle Exadata Storage Servers for X7 or later, the disk status should be Dropped for replacement, and the power LED on the flash card should be off, which indicates the flash disk is ready for online replacement.
On Oracle Exadata Database Machine X7 and later systems, all flash disks are in the form of an Add-in-Card (AIC), which is inserted into a PCIe slot on the motherboard. The slotNumber attribute shows the PCI number and FDOM number, regardless of whether it is an EF or HC storage server. 2ff7e9595c
Comments