REVIEW
In my previous posts I logged my experience installing the 11gR2 Grid Infrastructure and getting familiar with ASM. At this point I have a working ASM instance, as well as Oracle High Availability and Cluster Services running, and an ACFS file system, for kicks.
My ASM Disk groups available are:
ASMDATA - Where my database files will go
FRA - Where my Flash Recovery Area will go
FS - Hosts my ACFS partition, where database files will go that wouldn't normally go on an ASM disk.
My goal in this post is not to reproduce the steps in the manual, but to record my own notes of things that are interesting or troublesome along the way. I'm following the steps as documented in the Database Installation Guide 11g Release 2 (11.2) for Linux.
I already configured the kernel parameters and made sure the necessary libraries were installed, and I already created a separate user account (oracle11r2) to host all my 11.2.0.1 files. ASM was installed under the user oragrid, to keep the installations distinct.
HERE WE GO:
First note of "something new" in the install guide is the enhanced Secure Configuration option. I'm a security nut, so I'll be taking this option, especially since it gives more opportunity to learn.
Files are unzipped, X DISPLAY is verified, let's do this...
$ ./runInstaller
You do not have sufficient permissions to access the inventory '/u01/app/oraInventory/orainstRoot.sh.ASMbackup'...
Hit the first issue sooner than I thought.
$ ls /u01/app/oraInventory/orainstRoot.sh.ASMbackup
-rwxr-x--- 1 oragrid oinstall 1623 Sep 8 23:08 /u01/app/oraInventory/orainstRoot.sh.ASMbackup
This one's my own fault. I made a backup of the ASM root.sh script after the ASM install. It does not have group write permissions. Apparently runInstaller checks to see if all directories are accessible, not just specific directories that it will use. I changed the group write permission on the directory, since I don't expect it to be touched anyway. Now runInstaller is able to move onto the prerequisites, and we get the spiffy new 11gR2 Installer GUI.
Note: I'm posting this blog on a different machine, so I'm not taking screen shots as I go. There may be typos since I'm re-typing instead of copy-and-pasting. If you find one, please let me know so I can fix the post.
CONFIGURE SECURITY OPTIONS
I'm not setting this up for a connection to MetaLink so I uncheck that, and leave the email blank for now. I get a dialog box letting me know that Uncle Larry won't be looking over my shoulder to inform me of security updates now. That's fine with me for this project.
INSTALLATION OPTION
I'm selecting "Create and configure a database". I can always create another later for the DBCA experience. I'm more curious what this installer does on autopilot.
SYSTEM CLASS
Even though I am using a desktop class machine, in power, I have been treating it as a server, in function. ASM features are only available under this option, so I'm choosing "Server Class" here.
GRID OPTIONS
While I would really like to play with RAC features here at home, I know I don't have the infrastructure for it, so I won't risk confusing myself with a single-node RAC-ish installation. I'm sticking with "Single instance database installation" for this option.
INSTALL TYPE
The whole point of this exercise is to learn ASM, so I need to choose "Advanced Install" here, for the opportunity to install on ASM. (Of course I could start with non-ASM and convert to ASM as another exercise, but I have not layed out my partitions to do so).
PRODUCT LANGUAGES
Self-explanatory.
DATABASE EDITION
Enterprise, Standard, or Standard Edition ONE? Here is where licensing would come into play. Since this is under an OTN "get familiar, non-production" license, I'm going with EE with all the options, including Database Vault.
INSTALLATION LOCATION
Oracle Base: /u01/app/oracle11r2
Software Location: /u01/app/oracle11r2/product/11.2.0/dbhome_1
I'll take the defaults here, since I expect most places I go will install things in the default layouts. If they don't I'll be able to spot the difference right away.
Since the Oracle Inventory was created when I installed Grid Infrastructure, I'm not given that option here. I like this. I have enough ways to shoot myself in the foot. If Oracle can prevent me from a confusing multiple-inventory situation, I appreciate it.
CONFIGURATION TYPE
For now I'm interested in "General Purpose / Transaction Processing".
DATABASE IDENTIFIERS
Global Database Name: ora11r2.localdomain
SID: ora11r2
Normally I would change the localdomain to something appropriate, but I'm not worried about that for this machine. I left it at that default in the Infrastructure install, so that's what the Listener is expecting. I'll leave localdomain here, too.
CONFIGURATION OPTIONS
MEMORY
Enable Auto Memory Management. Yes, please. I'll start with the default 40% of memory (604MB in my case). In the real world I would do some capacity planning beforehand and size this based on a more educated guess, then tune as needed over time.
CHARACTER SETS
I'm taking the default here. I'll play with UNICODE in another instance, another time.
SECURITY
Assert All New Security Settings. I'm not upgrading an older database, and I am interested in learning new features, so I'll stick to the default here, which is yes... use the new security settings.
SAMPLE SCHEMAS
Good, the default is unchecked. I like that. I'll check it here, but for an application database I prefer to not have the sample schemas - less to secure or clean up.
MANAGEMENT OPTIONS
I don't have a Grid Control installation, or agent here, so that's grayed out. That leaves DB Control, which is fine. I'm skipping email notifications for now.
DATABASE STORAGE
Here's where we get interesting. ASM is selected by default. I put in the password I created for ASMSNMP during the ASM installation. There were fewer options than I expected here.
AUTOMATED BACKUPS
I haven't allocated space for multiple backups, so I'll leave the automated backups disabled. 11gR2 RMAN features and backup strategies will be an exercise for another time.
ASM DISK GROUP
Here is the option I expected to see earlier. My database files will go on ASMDATA.
SCHEMA PASSWORDS
Self-explanatory.
OPERATING SYSTEM GROUPS
Default.
PREREQUISITE CHECKS
I ran into the same prerequisite failure here on pdksh that I did on the Infrastructure installation. I know I have ksh available, even if it is not pdksh, so I'm Ignoring this issue and continuing. I wouldn't ignore this on a system that needs to remain stable.
SUMMARY
I'll save a response file (ora11r2.rsp) to use or review later.
INSTALL PRODUCT
Now we wait while OUI does it's thing.
For status freaks like me, who like to know more than just watching the checklist of what's been done so far, the Details window lets me watch the install log and scroll back through the history without 'tail'ing it in a separate window. Nice touch.
I didn't keep a timer, but for a full EE install on an underpowered machine, the install process moves along pretty well. I'm eager for the opportunity to see it on some bigger iron.
DBCA, on the other hand, is really pokey, but I don't blame DBCA. Now we're hitting the expected performance problems of my machine.
My system is CPU-bound. CPU is pegged at 100% for extended periods. No surprise on this box.
Memory is in good shape - hovering around 30% used.
top shows the database being created using more resources than the ASM database.
iostat DOES show activity spread out across the various partitions representing my ASM disks.
At this point I am seeing ASM get its first workout and I see the load spreading across what would normally be multiple devices. If you didn't read my first post on learning ASM, you missed the fact that I'm simulating 10 disks using 10 partitions on a single physical drive. High Speed anything was never a goal in this exercise.
dbca finished, root.sh run.
The root.sh script complained about dbhome, oraenv, and coraenv already existing in /usr/local/bin. I made backup copies of these and allowed root.sh to overwrite them. diff showed the new files identical to the old, and permissions are such that oragrid won't have a problem, so it appears to be OK to just overwrite what the Grid Installation installed.
FINISH
This screen simply reminds us of the URL for DB Control. Closed the installer.
TEST DRIVE
Start FireFox, browse to the EM URL, accept the security certificate, and ... waiting ...
DB Control reports the database components are unavailable.
Attempting to connect via sqlplus also fails, with a TNS error.
Clearly, something is misconfigured at this point. Before banging my head and making things worse, I turn the page to the Post-Installation Tasks.
I reboot just to clean things up a bit, and start a piece at a time. I notice my network interface failed on boot - that explains a number of things. I corrected the networking issue and lsnrctl status under oragrid responded as expected.
I login as oracle11r2, set ORACLE_HOME and ORACLE_SID as needed.
Now sqlplus works, and I see the database is already up. This is odd, t me, since the startup flag in /etc/oratab for this instance is 'N'. Oracle Restart is coming into play here. The startup configuration I already did for ASM laid the groundwork for the new instance "ora11r2" being started automatically.
$ ./emctl start dbconsole
DB Console appeared to come up, but crashed upon first login attempt.
I set up the .bash_profile for oracle11r2 to have the proper ORACLE_HOME and ORACLE_SID, created a DBA ID for myself in the database (so I didn't have to login as SYS), then I restarted the DB Console.
When DB Console is running, it appears to be the big CPU hog on this machine. CPU and disk I/O is moderate and acceptable when DB Console is not running.
POST_INSTALLATION TASKS
I followed the relevant steps for post-installation tasks that applied to my instance.
Via DB Console, I found my way to the Flash Recovery Area setting. I changed this from +ASMDATA to +FRA and increased the size to make use of more of the ASM disk group allocated to flash recovery.
THE END, AND THE BEGINNING
At this point Oracle 11g Release 2 is up and running on a tiny machine, with ASM.
Let the learning continue...
Tuesday, September 22, 2009
Friday, September 18, 2009
Learning ASM: Breakin' the Rules (or, Running ACFS on CentOS)
Part 2 mentioned that I could not get ACFS to work on CentOS. This is, of course, because it is not supported. I did find a way to work around this.
---------------------------------------------------------------------------------------
DISCLAIMER:
This post is provided as a log of personal experimentation only. If you attempt to replicate these steps on your system, it is wholly at your own risk with no guarantee you will get the same results as I did. It is in no way supported by Oracle, and I take no responsibility for any ill effects on your system - and Oracle won't either. This is wholly undocumented and unrecommended. I'm purposely changing things to understand them better. If you want a stable (and simpler) experience, use the Linux versions documented to be supported by Oracle.
---------------------------------------------------------------------------------------
With the lawyers held at bay, I can proceed. Remember, don't try this at work, kids - only on a system no one else uses, that you don't mind reinstalling if things go haywire.
My environment (CentOS 5.3)
$ uname -r
2.6.18-128.el5
According to the Oracle Database Storage Administrator's Guide ACFS on a non-clustered system requires some manual steps to launch the volume driver. The steps outlined at the hyperlink above detail what to do manually.
login as root
$ cd /u01/app/oragrid/product/11.2.0/grid/bin
$ ./acfsload start -s
ADVM/ACFS is not supported on centos-release-5-3.el5.centos.1
Not surprising, the documentation said as much. Let's see if we can work around that.
Some backtracking through the shell and perl scripts leads us to ../lib/osds_acfslib.pm
$ cd ../lib
$ cp -p osds_acfslib.pm osds_acfslib.pm.orig
Now I edit osds_acfslib.pm (carefully) Search for "support this release" to get to the section to tweak. (line 278 in my case)
Since CentOS5 is equivalent to RedHat 5, we'll modify the test for RedHat to look like this...
if (($release =~ /enterprise-release-5/) ||
($release =~ /redhat-release-5/) ||
($release =~ /centos-release-5/))
That gets us past the supported version check, based on the version string. Now we get ACFS-9129: ADVM/ACFS not installed
$ mkdir /lib/modules/2.6.18-128.el5/extra/usm
$ cp /u01/app/oragrid/product/11.2.0/grid/install/usm/EL5/i386/2.6.18-8/2.6.18-8.el5-i686/bin/*ko /lib/modules/2.6.18-128.el5/extra/usm
$ chmod 744 /lib/modules/2.6.18-128.el5/extra/usm
This puts the modules in the expected location. We can check it by:
$ cd /u01/app/oragrid/product/11.2.0/grid/bin
$ ./acfsdriverstate -orahome /u01/app/oragrid/product/11.2.0/grid version
ACFS-9205: OS/ADVM,ACFS installed version = 2.6.18-8.el5(i386)/090715.1
Record dependencies for the kernel modules. (The modules won't load if you skip this step).
$ depmod
Now running acfsload start -s completes without any further messages.
In a cluster install the system executables get copied from the install directory to /sbin as part of the root.sh script. Since this was a standalone install, they did not get copied. We get to do that manually (and set permissions).
$ cd /u01/app/oragrid/product/11.2.0/grid/install/usm/EL5/i386/2.6.18-8/2.6.18-8.el5-i686/bin
For each file that does not end in .ko (there are 12 of them):
$ cp acfsdbg /sbin
$ chmod 755 /sbin/acfsdbg
TEST FOR SUCCESS:
Now connect as oragrid.
Running asmca is noticeably different - the Volumes and ASM Cluster File System tabs are available. (They were greyed out when I tried before).
Now we'll try the command that failed before in ASMCMD
ASMCMD > volcreate -G FS -s 2G --redundancy mirror acfs1
This completed without any error message. Switching back to asmca to see it in the GUI...
ACFS1 now appears in the Volume tab, with a device /dev/asm/acfs1-287. It is unmounted. We could also find out this virtual device using the volinfo command in asmcmd.
Now make a file system on that volume and mount it...
To mount a filesystem, you need root privilege.
add the following to the beginning of /sbin/mount.acfs, after the header comments
ORACLE_HOME=/u01/app/oragrid/product/11.2.0/grid
Create a directory to mount the filesystem on. This step wasn't in the install guide. I got an error if I didn't pre-create a directory.
$ mkdir /u01/app/acfsmounts/acfstrial
Perform the mount and set permissions
$ mount -t acfs /dev/asm/acfs1-287 /u01/app/acfsmounts/acfstrial
$ chown -R oracle11r2:dba /u01/app/acfsmounts/acfstrial
And we are done! A usable file system, owned by the ID we will later install the Oracle 11gR2 Database using.
$ su - oracle11r2
$ cd /u01/app/acfsmounts/acfstrial
$ ls
lost+found
$ echo hello world > myfile
$ ls
lost+found myfile
$ cat myfile
hello world
And there we have it... ASM Cluster File System running on a standalone CentOS box. Now I have a place to experiment and build up my skill set, with no money out of pocket!
EPILOGUE (TODO):
This doesn't come up automatically following a reboot. I need to add startup scripts to perform the following steps:
1. Load the ADVM drivers
2. Enable the ASM volumes
3. Mount the ACFS filesystems
I can do these steps manually with a combination of the steps above, and with asmca.
---------------------------------------------------------------------------------------
DISCLAIMER:
This post is provided as a log of personal experimentation only. If you attempt to replicate these steps on your system, it is wholly at your own risk with no guarantee you will get the same results as I did. It is in no way supported by Oracle, and I take no responsibility for any ill effects on your system - and Oracle won't either. This is wholly undocumented and unrecommended. I'm purposely changing things to understand them better. If you want a stable (and simpler) experience, use the Linux versions documented to be supported by Oracle.
---------------------------------------------------------------------------------------
With the lawyers held at bay, I can proceed. Remember, don't try this at work, kids - only on a system no one else uses, that you don't mind reinstalling if things go haywire.
My environment (CentOS 5.3)
$ uname -r
2.6.18-128.el5
According to the Oracle Database Storage Administrator's Guide ACFS on a non-clustered system requires some manual steps to launch the volume driver. The steps outlined at the hyperlink above detail what to do manually.
login as root
$ cd /u01/app/oragrid/product/11.2.0/grid/bin
$ ./acfsload start -s
ADVM/ACFS is not supported on centos-release-5-3.el5.centos.1
Not surprising, the documentation said as much. Let's see if we can work around that.
Some backtracking through the shell and perl scripts leads us to ../lib/osds_acfslib.pm
$ cd ../lib
$ cp -p osds_acfslib.pm osds_acfslib.pm.orig
Now I edit osds_acfslib.pm (carefully) Search for "support this release" to get to the section to tweak. (line 278 in my case)
Since CentOS5 is equivalent to RedHat 5, we'll modify the test for RedHat to look like this...
if (($release =~ /enterprise-release-5/) ||
($release =~ /redhat-release-5/) ||
($release =~ /centos-release-5/))
That gets us past the supported version check, based on the version string. Now we get ACFS-9129: ADVM/ACFS not installed
$ mkdir /lib/modules/2.6.18-128.el5/extra/usm
$ cp /u01/app/oragrid/product/11.2.0/grid/install/usm/EL5/i386/2.6.18-8/2.6.18-8.el5-i686/bin/*ko /lib/modules/2.6.18-128.el5/extra/usm
$ chmod 744 /lib/modules/2.6.18-128.el5/extra/usm
This puts the modules in the expected location. We can check it by:
$ cd /u01/app/oragrid/product/11.2.0/grid/bin
$ ./acfsdriverstate -orahome /u01/app/oragrid/product/11.2.0/grid version
ACFS-9205: OS/ADVM,ACFS installed version = 2.6.18-8.el5(i386)/090715.1
Record dependencies for the kernel modules. (The modules won't load if you skip this step).
$ depmod
Now running acfsload start -s completes without any further messages.
In a cluster install the system executables get copied from the install directory to /sbin as part of the root.sh script. Since this was a standalone install, they did not get copied. We get to do that manually (and set permissions).
$ cd /u01/app/oragrid/product/11.2.0/grid/install/usm/EL5/i386/2.6.18-8/2.6.18-8.el5-i686/bin
For each file that does not end in .ko (there are 12 of them):
$ cp acfsdbg /sbin
$ chmod 755 /sbin/acfsdbg
TEST FOR SUCCESS:
Now connect as oragrid.
Running asmca is noticeably different - the Volumes and ASM Cluster File System tabs are available. (They were greyed out when I tried before).
Now we'll try the command that failed before in ASMCMD
ASMCMD > volcreate -G FS -s 2G --redundancy mirror acfs1
This completed without any error message. Switching back to asmca to see it in the GUI...
ACFS1 now appears in the Volume tab, with a device /dev/asm/acfs1-287. It is unmounted. We could also find out this virtual device using the volinfo command in asmcmd.
Now make a file system on that volume and mount it...
$ /sbin/mkfs -t acfs /dev/asm/acfs1-287
mkfs.acfs: version = 11.2.0.1.0.0
mkfs.acfs: on-disk version = 39.0
mkfs.acfs: volume = /dev/asm/acfs1-287
mkfs.acfs: volume size = 2147483648
mkfs.acfs: Format complete.
$ /sbin/acfsutil registry -a /dev/asm/acfs1-287 /u01/app/acfsmounts/acfstrialAt this point, the ASM Cluster File Systems tab in asmca shows our registered mount point, in a DISMOUNTED state. ls /u01/app shows no directory .../acfsmounts.
acfsutil registry: mount point /u01/app/acfsmounts/acfstrial successfully added
to Oracle Registry
To mount a filesystem, you need root privilege.
add the following to the beginning of /sbin/mount.acfs, after the header comments
ORACLE_HOME=/u01/app/oragrid/product/11.2.0/grid
Create a directory to mount the filesystem on. This step wasn't in the install guide. I got an error if I didn't pre-create a directory.
$ mkdir /u01/app/acfsmounts/acfstrial
Perform the mount and set permissions
$ mount -t acfs /dev/asm/acfs1-287 /u01/app/acfsmounts/acfstrial
$ chown -R oracle11r2:dba /u01/app/acfsmounts/acfstrial
And we are done! A usable file system, owned by the ID we will later install the Oracle 11gR2 Database using.
$ su - oracle11r2
$ cd /u01/app/acfsmounts/acfstrial
$ ls
lost+found
$ echo hello world > myfile
$ ls
lost+found myfile
$ cat myfile
hello world
And there we have it... ASM Cluster File System running on a standalone CentOS box. Now I have a place to experiment and build up my skill set, with no money out of pocket!
EPILOGUE (TODO):
This doesn't come up automatically following a reboot. I need to add startup scripts to perform the following steps:
1. Load the ADVM drivers
2. Enable the ASM volumes
3. Mount the ACFS filesystems
I can do these steps manually with a combination of the steps above, and with asmca.
Labels:
11gR2,
ACFS,
CentOS,
Oracle Grid Infrastructure,
Unsupported
Wednesday, September 9, 2009
Learning ASM: Part 2
In the previous post, I installed ASM on a woefully underpowered machine. Things I learned along the way included:
ASMCMD is the command-line tool for managing ASM. You can also manage it through Enterprise Manager, but that is not installed yet, so not an option. I always like to know how to do things at the command line, since that is always available. For this section I'm using a chapter from the Oracle Database Storage Administrators Guide .
First we need to make sure ORACLE_HOME and ORACLE_SID are set appropriately. This was not a step in the install guide, or I missed it. I added the following to my .bash_profile for oragrid
umask 022 ORACLE_HOME=/u01/app/oragrid/product/11.2.0/grid ORACLE_SID=+ASM export ORACLE_HOME ORACLE_SID PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin export PATH
Now asmcmd will start properly
[oragrid@loclahost ~]$ . .bash_profile [oragrid@loclahost ~]$ asmcmd ASMCMD> ls ASMDATA/ FRA/ FS/
Next I spent some time playing around with ASM commands. shutdown and startup were straightforward and behaved as expected. I exited asmcmd and restarted it with th e-p and -v options to show the current directory in the prompt (-p) and more verbose output (-v). At the asmcmd prompt, 'help' displays the list of valid asm commands. I could go through the manual, but I like to poke around sometimes on my own in this situation.
ls and cd navigate through the directory. ls -l provides a longer list, as I expected.
ASMCMD [+] > cd +ASMDATA/asm/asmparameterfile ASMCMD [+ASMDATA/asm/asmparameterfile] > ls REGISTRY.253.697071913 ASMCMD [+ASMDATA/asm/asmparameterfile] > ls -l Type Redund Striped Time Sys Name ASMPARAMETERFILE MIRROR COARSE SEP 08 23:00:00 Y REGISTRY.253.697071913
Wonder where a file is located?
ASMCMD [+]> find + registry* +ASMDATA/ASM/ASMPARAMETERFILE/REGISTRY.253.697071913
Need to manage the available disk groups?
ASMCMD [+] > ls -l State Type Rebal Name MOUNTED NORMAL N ASMDATA/ MOUNTED NORMAL N FRA/ MOUNTED NORMAL N FS/ ASMCMD [+] > umount FS ASMCMD [+] > ls -l State Type Rebal Name MOUNTED NORMAL N ASMDATA/ MOUNTED NORMAL N FRA/ ASMCMD [+] > mount FS
ASMCMD [+] > ls -l State Type Rebal Name MOUNTED NORMAL N ASMDATA/ MOUNTED NORMAL N FRA/ MOUNTED NORMAL N FS/
At certain points, such as when a new device is added to a disk group, ASM will rebalance the data among the devices in a group to spread the I/O evenly throughout. You can also force a rebalance to happen immediately.
ASMCMD [+] > rebal FRA
While the Rebalance is occuring, ls -l will report the diskgroup with a Y in the Rebal column.
Curios about the I/O load across disks?
ASMCMD [+] > iostat Group_Name Dsk_Name Reads Writes ASMDATA DATA1 7421952 5046272 ASMDATA DATA2 8433664 2883584 ...
Need to check integrity of a disk grou? chkdg
Did your disk develop bad sectors? the remap command will avoid them. (for more info, help remap).
Let's try taking a disk out of a disk group
ASMCMD [+] > offline -G ASMDATA -D DATA3 ORA-15032: not all alterations performed ORA-15283: ASM operation requires compatible.rdbms of 11.1.0.0.0 or higher Really? Why is that?
ASMCMD [+] > lsattr -G ASMDATA -l rdbms
Name Value
compatible.rdbms 10.1.0.0.0
Interesting, the default is to be compatible with a 10g database, not an 11g database. This setting is probably to assist with upgrades, since 10g will be more prevalent still that 11g for a while. We don't have a 10g database right now, so I'll change it to test the offlining. ASMCMD [+] > setattr -G ASMDATA compatible.rdbms 11.1.0.0.0
Now, let's try that offline again:
ASMCMD [+] > offline -G ASMDATA -D DATA3 ASMCMD [+] > lsdsk -G ASMDATA Path ORCL:DATA1 ORCL:DATA2 ORCL:DATA4 ORCL:DATA5 ORCL:DATA6
GETTIN' CRAZY: CREATING AN ACFS VOLUME VIA ASMCMD (FAIL)
To have ASM act as a storage manager, or volume manager, for a general-purpose file system, we first define a volume within a disk group. We have the FS disk group defined for this purpose, so we'll create 2 volumes within that.
ASMCMD [+] > volcreate -G FS -s 2G --redundancy mirror acfs1
ORA-15032: not all alterations performed
ORA-15477: cannot communicate with the volume driver
Apparently something else is not installed. According to documentation, ACFS is only supported on Oracle Enterprise Linux 5 or Red Had Enterprisxe Linux 5. CentOS 5 is supposed to be equivalent to RHEL, perhaps this is a point where it is not equivalent enough?
My lesson to myself here - stick with supported O/S versions. Oracle Enterprise Linux is supported, and designed around running ASM and the Oracle Database. I'll be switching to that later to study ACFS, but in the meantime I'll table the ACFS exploration and continue looking at ASM and 11gR2.
REBOOT: GETTING ASM TO START AUTOMATICALLY ON BOOT
I want ASM to start on boot, so I modified /etc/oratab to start +ASM.
I stopped ASM using ASMCMD
ASMCMD> shutdown --immediate
ASM diskgroups dismounted
ASM instance shutdown
Connected to an idle instance.
The O/S was restarted.
Following the restart, a ps-fu oragrid showed a few processes running, but not enough.
ASMCMD startup yielded an ORA-01078: failure in processing system parameters, and ORA-29701: unable to connect to Cluster Synchronization Service. The supporting cluster software is not fully functioning. To start it:
login as root
cd /u01/app/oragrid/product/11.2.0/grid/bin
./crsctl start resource -all
Now ps -fu oragrid shows an instance running. logout of root, back in as oragrid, and now asmcmd shows the disk groups available. Interestingly, the startup=Y was changed back to N in /etc/oratab for the +ASM instance. Note to self: +ASM should not be managed by oratab.
The installer added oracleasm and ohasd to the /etc/init.d and rc3.d and rc5.d, but not the cluster services. I created an init script to do the steps we just did manually
cd /etc/init.d
cp -p ohasd ocrs
Edit ocrs changing the start() subroutine to include
/u01/app/oragrid/product/11.2.0/grid/bin/crsctl start resource -all
and the stop() subroutine to include
/u01/app/oragrid/product/11.2.0/grid/bin/crsctl stop resource -all
as well as modifying any descriptive text in the script.
I created symbolic links in rc5.d and rc3.d for Start and Stop at step 97 (after OHASD has started)
cd ../rc5.d
ln -s /etc/init.d/ocrs S97ocrs
ln -s /etc/init.d/ocrs K97ocrs
cd ../rc3.d
(same thing)
And now ASM starts on boot.
Next post: actually installing a database on 11gR2 ASM.
- Don't skip steps. If Oracle says install Grid Infrastructure First, don't start by setting up a user id for the Oracle Database.
- Partitions on a single physical drive will work for experimenting with ASM, but for hopefully obvious reasons is not something you would do on a system that actually supports development or production.
- 11g Release 2 changes a few of the familiar installation steps, especially the Oracle listener.
- It is helpful to plan (and document) your desired disk group layout before you start. ASM will manage everything for you, but that is no substitute for a diagram on a piece of paper to keep everything straight.
ASMCMD is the command-line tool for managing ASM. You can also manage it through Enterprise Manager, but that is not installed yet, so not an option. I always like to know how to do things at the command line, since that is always available. For this section I'm using a chapter from the Oracle Database Storage Administrators Guide .
First we need to make sure ORACLE_HOME and ORACLE_SID are set appropriately. This was not a step in the install guide, or I missed it. I added the following to my .bash_profile for oragrid
umask 022 ORACLE_HOME=/u01/app/oragrid/product/11.2.0/grid ORACLE_SID=+ASM export ORACLE_HOME ORACLE_SID PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin export PATH
Now asmcmd will start properly
[oragrid@loclahost ~]$ . .bash_profile [oragrid@loclahost ~]$ asmcmd ASMCMD> ls ASMDATA/ FRA/ FS/
Next I spent some time playing around with ASM commands. shutdown and startup were straightforward and behaved as expected. I exited asmcmd and restarted it with th e-p and -v options to show the current directory in the prompt (-p) and more verbose output (-v). At the asmcmd prompt, 'help' displays the list of valid asm commands. I could go through the manual, but I like to poke around sometimes on my own in this situation.
ls and cd navigate through the directory. ls -l provides a longer list, as I expected.
ASMCMD [+] > cd +ASMDATA/asm/asmparameterfile ASMCMD [+ASMDATA/asm/asmparameterfile] > ls REGISTRY.253.697071913 ASMCMD [+ASMDATA/asm/asmparameterfile] > ls -l Type Redund Striped Time Sys Name ASMPARAMETERFILE MIRROR COARSE SEP 08 23:00:00 Y REGISTRY.253.697071913
Wonder where a file is located?
ASMCMD [+]> find + registry* +ASMDATA/ASM/ASMPARAMETERFILE/REGISTRY.253.697071913
Need to manage the available disk groups?
ASMCMD [+] > ls -l State Type Rebal Name MOUNTED NORMAL N ASMDATA/ MOUNTED NORMAL N FRA/ MOUNTED NORMAL N FS/ ASMCMD [+] > umount FS ASMCMD [+] > ls -l State Type Rebal Name MOUNTED NORMAL N ASMDATA/ MOUNTED NORMAL N FRA/ ASMCMD [+] > mount FS
ASMCMD [+] > ls -l State Type Rebal Name MOUNTED NORMAL N ASMDATA/ MOUNTED NORMAL N FRA/ MOUNTED NORMAL N FS/
At certain points, such as when a new device is added to a disk group, ASM will rebalance the data among the devices in a group to spread the I/O evenly throughout. You can also force a rebalance to happen immediately.
ASMCMD [+] > rebal FRA
While the Rebalance is occuring, ls -l will report the diskgroup with a Y in the Rebal column.
Curios about the I/O load across disks?
ASMCMD [+] > iostat Group_Name Dsk_Name Reads Writes ASMDATA DATA1 7421952 5046272 ASMDATA DATA2 8433664 2883584 ...
Need to check integrity of a disk grou? chkdg
Did your disk develop bad sectors? the remap command will avoid them. (for more info, help remap).
Let's try taking a disk out of a disk group
ASMCMD [+] > offline -G ASMDATA -D DATA3 ORA-15032: not all alterations performed ORA-15283: ASM operation requires compatible.rdbms of 11.1.0.0.0 or higher Really? Why is that?
ASMCMD [+] > lsattr -G ASMDATA -l rdbms
Name Value
compatible.rdbms 10.1.0.0.0
Interesting, the default is to be compatible with a 10g database, not an 11g database. This setting is probably to assist with upgrades, since 10g will be more prevalent still that 11g for a while. We don't have a 10g database right now, so I'll change it to test the offlining. ASMCMD [+] > setattr -G ASMDATA compatible.rdbms 11.1.0.0.0
Now, let's try that offline again:
ASMCMD [+] > offline -G ASMDATA -D DATA3 ASMCMD [+] > lsdsk -G ASMDATA Path ORCL:DATA1 ORCL:DATA2 ORCL:DATA4 ORCL:DATA5 ORCL:DATA6
GETTIN' CRAZY: CREATING AN ACFS VOLUME VIA ASMCMD (FAIL)
To have ASM act as a storage manager, or volume manager, for a general-purpose file system, we first define a volume within a disk group. We have the FS disk group defined for this purpose, so we'll create 2 volumes within that.
ASMCMD [+] > volcreate -G FS -s 2G --redundancy mirror acfs1
ORA-15032: not all alterations performed
ORA-15477: cannot communicate with the volume driver
Apparently something else is not installed. According to documentation, ACFS is only supported on Oracle Enterprise Linux 5 or Red Had Enterprisxe Linux 5. CentOS 5 is supposed to be equivalent to RHEL, perhaps this is a point where it is not equivalent enough?
My lesson to myself here - stick with supported O/S versions. Oracle Enterprise Linux is supported, and designed around running ASM and the Oracle Database. I'll be switching to that later to study ACFS, but in the meantime I'll table the ACFS exploration and continue looking at ASM and 11gR2.
REBOOT: GETTING ASM TO START AUTOMATICALLY ON BOOT
I want ASM to start on boot, so I modified /etc/oratab to start +ASM.
I stopped ASM using ASMCMD
ASMCMD> shutdown --immediate
ASM diskgroups dismounted
ASM instance shutdown
Connected to an idle instance.
The O/S was restarted.
Following the restart, a ps-fu oragrid showed a few processes running, but not enough.
ASMCMD startup yielded an ORA-01078: failure in processing system parameters, and ORA-29701: unable to connect to Cluster Synchronization Service. The supporting cluster software is not fully functioning. To start it:
login as root
cd /u01/app/oragrid/product/11.2.0/grid/bin
./crsctl start resource -all
Now ps -fu oragrid shows an instance running. logout of root, back in as oragrid, and now asmcmd shows the disk groups available. Interestingly, the startup=Y was changed back to N in /etc/oratab for the +ASM instance. Note to self: +ASM should not be managed by oratab.
The installer added oracleasm and ohasd to the /etc/init.d and rc3.d and rc5.d, but not the cluster services. I created an init script to do the steps we just did manually
cd /etc/init.d
cp -p ohasd ocrs
Edit ocrs changing the start() subroutine to include
/u01/app/oragrid/product/11.2.0/grid/bin/crsctl start resource -all
and the stop() subroutine to include
/u01/app/oragrid/product/11.2.0/grid/bin/crsctl stop resource -all
as well as modifying any descriptive text in the script.
I created symbolic links in rc5.d and rc3.d for Start and Stop at step 97 (after OHASD has started)
cd ../rc5.d
ln -s /etc/init.d/ocrs S97ocrs
ln -s /etc/init.d/ocrs K97ocrs
cd ../rc3.d
(same thing)
And now ASM starts on boot.
Next post: actually installing a database on 11gR2 ASM.
Labels:
11gR2,
ASM,
ASMCMD,
Oracle Grid Infrastructure
Tuesday, September 8, 2009
Learning ASM: Part 1
Oracle's ASM (Automatic Storage Management) may well be one of the most significant additions to the Oracle database, beginning in 10g, and becomes more interesting in 11g Release 2. I wanted to learn it at home, before I run into it at a client, and because it is extremely interesting technology. This post logs my experiences setting it up for the first time. This may become my personal HOWTO reference on the topic - expect to see extreme detail as we get into it.
About ASM:
If you are unfamiliar with ASM, it is storage management software that comes with the database. It provides the performance of using RAW devices without the need to manage an storage volume through the operating system or a logical volume manager. For the DBA this translates into more direct control over the storage and more efficiency. For the business, this may translate into reduced costs, if it saves on a license for 3rd-party volume management.
You provide ASM with a pool of storage devices, and then tell ASM how to divide that up logically. You can add or remove devices as needed, without interrupting the database, and ASM will automatically spread your data and load across all the devices. This can be a viable substitute for RAID systems, eliminating another layer of software and hardware. (However, if you have a solid RAID strategy in place already, you can use ASM for managing the volumes, and still let the RAID handle redundancy.)
In 11g Release 2 it gets even more interesting with the introduction of ASM Cluster File System (ACFS). ACFS provides an operating-system-usable file system running on top of Oracle ASM. "Why would I do that?" to further simplify the life of the DBA. In 10g, ASM could not be used for files outside the database, such as binaries, log files, etc. With ACFS all of those files can also be moved into ASM storage. You can set up a cluster of machines running ASM and it can pool storage across the entire cluster, and present it to Oracle and/or the filesystem as if it were a big pool of local storage.
The setup:
I'm slow to adopt new hardware at home, so to make this more interesting I am going to be exploring this on a single-CPU Celeron machine, circa 2005. I don't expect speed or performance. I just want to learn the syntax and management. And yes, Oracle will run on this... just don't expect to do much with it besides self-education. This is a home-class machine, so I have 2 hard drives. One with some older data in a Windows partition I want to save, The other a larger drive with a Windows partition, a Linux partition, and a bunch of extra space to play with.
Hardware: Single CPU Celeron
RAM 1.5 GB
Storage: 1 250GB Hard Drive, in multiple partitions.
ASM is designed to pool storage across multiple devices. I don't have multiple devices, but can simulate that for the sake of education by using multiple partitions. I fully expect this to perform slowly since these are on the same physical device.
FIRST STEP: SET UP PARTITIONS AND INSTALL LINUX
I'm going with CentOS 5.3, which is equivalent to RHEL 5. I had previously installed 11g Release 1 on this machine, so I know the Oracle Database will run, at least.
Here is my partition layout, yours may be different.
/dev/hda
/dev/hda1 Windows Partition NTFS 102398 MB
/dev/hda5 / ext3 23454 MB
/dev/hda6-15 10 virtual physical devices LVM PV 10236 MB each
/dev/hda16 swap swap 10236 MB
In a real system I wouldn't lump root, /home, etc all on the same partition, but for this exercise I'm going simple. I made my partitions roughly 10GB in size for simplicity later.
After this I finished the Linux installation, and made sure all the required packages for the Oracle Database were installed, as directed in the 11gR2 Installation Guide for Linux.
NEXT STEP: CONFIGURE LINUX
In reviewing the documentation for 11gR2 I found this little nugget... "Oracle Database 11g Release 2 introduces the Grid Infrastructure installation. For single instance databases, the grid infrastructure includes Automatic Storage Management, the listener, and Oracle Restart."..."If you want to use grid infrastructure for a standalone server, then you must install the oracle software from the grid infrastructure media before you install the database." In 11gR1 ASM was included on the Database Install media. In this release it is separate.
(What's New in Oracle Database 11g )
(If you are following this as a HOWTO, hold off on this section until AFTER Oracle Grid Infrastructure is installed in the next step. I got ahead fo myself - don't get yourself into the same trouble I did.) I prepared to install the Oracle 11gR2 software according to according to the Quick Install Guide.
INFRASTRUCTURE INSTALL: ASM LIBRARIES
I installed ASM according to the Oracle Grid Infrastructure Install Guide.
oracleasm createdisk data3 /dev/hda8
oracleasm createdisk data4 /dev/hda9 oracleasm createdisk data5 /dev/hda10 oracleasm createdisk data6 /dev/hda11
At this point I unzipped my install media, and ran the Grid Infrastructure installer. Note: The documentation does not say switch to the 'oracle' user here, but runInstaller will not run as root, so I switched to the new oragrid user created earlier.
INFRASTRUCTURE INSTALL: POST-INSTALL REVIEW
There are several interesting things in the post-install steps.
Of particular interest:
WRAP-UP: ASM IS INSTALLED
At this point, ASM has been successfully installed, and we can add and remove disks from disk groups at will. In the next installments we are ready to add a Database to the mix, try out the new-fangled File System, and (oh yeah, shut down and start everything up cleanly). I have 8 or 9 hours invested to this point, and that includes installing the operating system and writing this blog along the way.
About ASM:
If you are unfamiliar with ASM, it is storage management software that comes with the database. It provides the performance of using RAW devices without the need to manage an storage volume through the operating system or a logical volume manager. For the DBA this translates into more direct control over the storage and more efficiency. For the business, this may translate into reduced costs, if it saves on a license for 3rd-party volume management.
You provide ASM with a pool of storage devices, and then tell ASM how to divide that up logically. You can add or remove devices as needed, without interrupting the database, and ASM will automatically spread your data and load across all the devices. This can be a viable substitute for RAID systems, eliminating another layer of software and hardware. (However, if you have a solid RAID strategy in place already, you can use ASM for managing the volumes, and still let the RAID handle redundancy.)
In 11g Release 2 it gets even more interesting with the introduction of ASM Cluster File System (ACFS). ACFS provides an operating-system-usable file system running on top of Oracle ASM. "Why would I do that?" to further simplify the life of the DBA. In 10g, ASM could not be used for files outside the database, such as binaries, log files, etc. With ACFS all of those files can also be moved into ASM storage. You can set up a cluster of machines running ASM and it can pool storage across the entire cluster, and present it to Oracle and/or the filesystem as if it were a big pool of local storage.
The setup:
I'm slow to adopt new hardware at home, so to make this more interesting I am going to be exploring this on a single-CPU Celeron machine, circa 2005. I don't expect speed or performance. I just want to learn the syntax and management. And yes, Oracle will run on this... just don't expect to do much with it besides self-education. This is a home-class machine, so I have 2 hard drives. One with some older data in a Windows partition I want to save, The other a larger drive with a Windows partition, a Linux partition, and a bunch of extra space to play with.
Hardware: Single CPU Celeron
RAM 1.5 GB
Storage: 1 250GB Hard Drive, in multiple partitions.
ASM is designed to pool storage across multiple devices. I don't have multiple devices, but can simulate that for the sake of education by using multiple partitions. I fully expect this to perform slowly since these are on the same physical device.
FIRST STEP: SET UP PARTITIONS AND INSTALL LINUX
I'm going with CentOS 5.3, which is equivalent to RHEL 5. I had previously installed 11g Release 1 on this machine, so I know the Oracle Database will run, at least.
Here is my partition layout, yours may be different.
/dev/hda
/dev/hda1 Windows Partition NTFS 102398 MB
/dev/hda5 / ext3 23454 MB
/dev/hda6-15 10 virtual physical devices LVM PV 10236 MB each
/dev/hda16 swap swap 10236 MB
In a real system I wouldn't lump root, /home, etc all on the same partition, but for this exercise I'm going simple. I made my partitions roughly 10GB in size for simplicity later.
After this I finished the Linux installation, and made sure all the required packages for the Oracle Database were installed, as directed in the 11gR2 Installation Guide for Linux.
NEXT STEP: CONFIGURE LINUX
- Post-install configuration included setting the following options:
- Opened TCP port 1521 in the firewall.
- SELinux Setting set to Permissive
- Created a non-root user (bbontrag, in my case) to login as
- Configured Linux to connect to my local network, and verified Internet access
- Verified that boot parameters are only attaching drives 0 and 4 (/ and swap, or /dev/hda1 and /dev/hda5). This was verified in /boot/grub/grub.conf and by viewing the partitions in CentOS Logical Volume Management.
- Rebooted at this point to ensure networking was fully operational
- Used the Package Manager to verify all required packages from the Oracle Install Guide.
- On a normal development or production system I would install O/S patches at this point. For this machine, I am skipping that step, so I can practice patching AFTER a database is installed. (But that's another post).
In reviewing the documentation for 11gR2 I found this little nugget... "Oracle Database 11g Release 2 introduces the Grid Infrastructure installation. For single instance databases, the grid infrastructure includes Automatic Storage Management, the listener, and Oracle Restart."..."If you want to use grid infrastructure for a standalone server, then you must install the oracle software from the grid infrastructure media before you install the database." In 11gR1 ASM was included on the Database Install media. In this release it is separate.
(What's New in Oracle Database 11g )
(If you are following this as a HOWTO, hold off on this section until AFTER Oracle Grid Infrastructure is installed in the next step. I got ahead fo myself - don't get yourself into the same trouble I did.) I prepared to install the Oracle 11gR2 software according to according to the Quick Install Guide.
- Checked installed packages
- Setup groups and users, since this was the first install of Oracle software
- Set kernel parameters and other configuration parameters
- I created oracle11r2 instead of oracle as the oracle User Id, to allow a future install of previous versions to practice upgrades.
- I had trouble with the default profile script trying to use ksh as the default shell for oracle11r2. I changed the default shell to bash and it worked fine
INFRASTRUCTURE INSTALL: ASM LIBRARIES
I installed ASM according to the Oracle Grid Infrastructure Install Guide.
- created user oragrid, where the instructions called for "oracle". This will allow me to understand the difference between grid infrastructure files and normal database files. (I may get into trouble by doing this... we'll see). Note: after creating new users I had to logout and log back in, to restart the X server. New users did not have authority to run X apps, such as the OUI, so this was my workaround.
- Since permissions for oracle were already set on /u01/app I reset ownership to be what the Infrastructure guide called for.
- Since my installation is not registered with the Unbreakable Linux Network, I had the joy of installing ASMLIB manually, which is a set of Linux libraries for interacting with ASM. The steps are documented in the Install Guide. Follow them carefully, On my first attempt I downloaded the wrong version. On my second attempt I still missed the oraclasm driver for my kernel. There are 3 packages to install, if the install guide is not clear on that.
- Install the oracleasm package first, then the other packages. The oracleasmlib package depends on it.
- Once the packages successfully installed I ran oracleasm configure -i as instructed.
- The install guide is not clear what to enter here. I used user=oragrid, group=oinstall, Start on boot=y, Scan on boot=y
- My first attempt at the following step failed with an error "Instantiating disk: failed". I found through a quick Google search that ASM was not initialized, so I started ASM by issuing oracleasm init. After that the following createdisk commands worked as expected.
- Next I set my empty partitions as candidate groups for ASM.
oracleasm createdisk data3 /dev/hda8
oracleasm createdisk data4 /dev/hda9 oracleasm createdisk data5 /dev/hda10 oracleasm createdisk data6 /dev/hda11
- I only created 6 at first, so I can play with adding more later. Verify with oracleasm listdisks. 6 disks are displayed DISK1-DISK6. Great.
At this point I unzipped my install media, and ran the Grid Infrastructure installer. Note: The documentation does not say switch to the 'oracle' user here, but runInstaller will not run as root, so I switched to the new oragrid user created earlier.
- This is where I hit my first permissions hiccup of having separate IDs for Infrastructure and Database. I happened to have the media files under /home/oracle11r2 not under /oragrid. Once I changed the group on /home/oracle11r2 to oinstall and granted group read/execute on it, oragrid could run the installer.
- Once OUI started, I selected "Install and Configure Grid Infrastructure for a Standalone Server"
- I picked English as my language.
- Next I was prompted to set up DISK GROUPS. Let's review..
- First, I named my disk group ASMDATA, selected Normal redundancy, and selected all 6 devices as Candidate Disks. The failure groups for redundancy may be defined later? We'll find out.
- I set passwords for the SYS and ASMSNP accounts. Let's review again...
- I assigned all the authentication groups to "oinstall" (default).
- Oracle Base = /u01/app/oragrid (default)
- Software Location = /u01/app/oragrid/product/11.2.0/grid (default, Very OFA-ish, no?)
- Inventory Directory = /u01/app/oraInventory (default)
- OUI checked for installation requirements, and here is where my home machine starts to fall short. The Physical Memory check failed... 1.5GB was required, and my system reports 1.48GB. It also did not recognize my swap space.
- There was a package ASM expected (pdksh-5.2.14) which I found, downloaded, and tried to install. I found it conflicted with ksh and bash already installed, so I left things as-is.
- The max file descriptor limit was set too low. OUI was able to generate a fixit script to take care of this one.
- I told OUI to ignore the remaining prerequisite checks and continue anyway... The goal is to learn so if this screws something up I will learn even more because of it.
- I elected to save a repsonse file. I'm not setting up a cluster, but this will be helpful to study, even on this standalone install.
- Installation continued without incident, and I ran the 2 scripts as root when prompted.
- ps -fu oragrid shows all the processes I would expect to see from an Oracle database instance
- Pulling up the System Monitor, I see a little bit of load and memory usage, but nothing that concerns me.
- /etc/mtab now contains a mount point for oracleasmfs
- I modified /etc/oratab to automatically start the instance "+ASM" on startup.
INFRASTRUCTURE INSTALL: POST-INSTALL REVIEW
There are several interesting things in the post-install steps.
Of particular interest:
- In a real system, patching is important. THe first post-install instruction is to apply patches. Do so. I'm not here, as it is out of scope for this exercise. Patching will be tried later.
- The 11gR2 Grid Infrastructure includes the listener. When managing the listener the ORACLE_HOME for the ASM/Grid Infrastructure installation must be used. I have to say I like this. I was never a fan of multiple listeners on a single node. I understood the reasoning of the DBAs who chose to run that way, but disagreed on the need.
- The 11gR2 Grid Infrastructure will support databases as old as 9.2.
- Backup the orainstRoot.sh script in /u01/app/oraInventory for future reference and troubleshooting. OK, done.
- We set up a single disk group during the installation. For installing 11gR2 we will want another disk group for the Flash Recovery Area. I'm already planning to pare down the 6 disks to maybe 4. We still have 4 partitions which we can use for an ACFS playground.
- The ASM utility binaries do not seem to be in the location expected in the postinstallation steps. This symbolic link is a helpful shortcut: ln -s /u01/app/oragrid/product/11.2.0/grid/bin /u01/app/11.2.0/grid/bin
- Before adding new disk grounds, I need to enable to other partitions for ASM, with oracleasm (as root). Done. data7-data10 now exist.
- Logged in as oragrid, I go to /u01/app/11.2.0/grid/bin and run ./asmca to manage disk groups.
- I go to create new group. The HEADER STATUS column shows the state of all the devices available. I see my 4 new partitions PROVISIONED, and if I click "Show All" Member Disks, I see my other 6 have a status of MEMBER
- I created a disk group FRA from DISK7 and DISK8, with normal redundancy.
- I created a disk group FS from DISK9 and DISK10, with normal redundancy.
- We notice on each of these the Usable capacity is half the Size. normal redundancy is mirroring the data.
WRAP-UP: ASM IS INSTALLED
At this point, ASM has been successfully installed, and we can add and remove disks from disk groups at will. In the next installments we are ready to add a Database to the mix, try out the new-fangled File System, and (oh yeah, shut down and start everything up cleanly). I have 8 or 9 hours invested to this point, and that includes installing the operating system and writing this blog along the way.
Labels:
11gR2,
ACFS,
ASM,
install,
Oracle Grid Infrastructure,
storage management
Wednesday, September 2, 2009
A "Real" DBA
A friend asked recently, "Would you be interested/able to show me some of the skills I'd need to be a REAL DBA?" He'd been working in a mix of databases for 10 years, and was comfortable with application-level aspects, such as tables, stored procedures, triggers, and views, but wanted some coaching on lower-level aspects, such as Backup/Recovery and Tuning. Mike, this one's for you!
Some shops I've worked in make a distinction between "Application DBA" and "Infrastructure DBA", where Application DBAs manage schemas and develop applications, and Infrastructure DBAs install the software, perform backups, and keep things running. Other shops I've been in have the "DBA" doing all of the above. (My thoughts on division of responsibilities is a topic for another post). I contend that both are "Real DBAs", each with different specialties, so I'll just use "DBA" from here on out.
It is helpful for a DBA to be familiar with both the application-level and infrastructure-level concepts, regardless of the database platform. If your organization splits responsibilities, you may know more about one or the other in practice, but should at least understand the concepts.
The Application DBA can write more effective SQL and procedures if they have a thorough understanding of how the infrastructure works.
The Infrastructure DBA can perform their job more effectively by understanding the applications that use "their" database.
When it comes to many of the questions above, each database is different. Oracle handles read consistency differently than MySQL, which handles it differently than SQL Server. There are application design implications here, and different performance tuning approaches for each. The DBA needs to understand how the database they service manages data internally, to some degree. Reading the documentation is often the best place to start.
No DBA can know everything, so an effective DBA will have resources in mind where he can get questions answered as needed. Remain teachable and open to learning something new every day.
Down to more technical details, the DBA MUST, absolutely, know backup and recovery strategies for their chosen database. Each product has different methods and semantics, so learn what is appropriate. There may be multiple strategies (hot backups vs. cold backups, vs. exports - which are not really a backup, but that's another post). Understand how your database backup relates to your operating system backup. (For example, that O/S backup of /oradata, in and of itself, is utterly useless if your database is running at the time.)
Always remember, an untested backup is no backup at all. If you are not confident in your ability to restore a backup, you might as well not be taking it. Know how to restore the entire database, as well as subsets of the data... individual data files, individual tables, individual tablespaces, point-in-time recovery.
Performance Tuning is an acquired skill, learned mostly through experience, often with bumps and bruises along the way. In Oracle, at least, there are many tools available. Experiment with each of the tools available, learn their strengths. Use these tools to measure performance in good times and bad, so you can tell the difference. In a performance "crisis" you will likely be facing a situation unlike others you have solved in the past. Be prepared to do critical analysis on the fly. If you are making a change to fix a performance problem, have evidence to support why THIS change is the right choice and not the first parameter tweak that came to mind.
Know how to read your system's log files. Know how to generate diagnostic traces, if your database-of-choice supports it. Practice (in a test database) if you have that luxury, by setting up situations you expect to be a problem, then observing the messages that show up in the logs.
Practice installing the database, especially if you inherited a system already up and running. Perform regular patches. Pay attention to the files and directories that are created and know their purpose. Know what processes are expected to be running and their typical load. To beat the proverbial dead horse, the documentation for your system may be very valuable here.
The most important non-technical skill for the DBA may be communication. Cultivate good working relationships with the development team, and with the system administrators (if that doesn't already include you). Present yourself as an accessible resource, rather than the surly gatekeeper, and you will find a much smoother experience in implementing changes. At the same time, the "surly gatekeeper" hat is sometimes important and necessary - know when it is appropriate in your organization.
As long as this post has been, I'm sure I missed several important areas that could be added. As they come to mind, look for more posts.
Some shops I've worked in make a distinction between "Application DBA" and "Infrastructure DBA", where Application DBAs manage schemas and develop applications, and Infrastructure DBAs install the software, perform backups, and keep things running. Other shops I've been in have the "DBA" doing all of the above. (My thoughts on division of responsibilities is a topic for another post). I contend that both are "Real DBAs", each with different specialties, so I'll just use "DBA" from here on out.
It is helpful for a DBA to be familiar with both the application-level and infrastructure-level concepts, regardless of the database platform. If your organization splits responsibilities, you may know more about one or the other in practice, but should at least understand the concepts.
The Application DBA can write more effective SQL and procedures if they have a thorough understanding of how the infrastructure works.
- How does the database handle read consistency?
- How does it handle transactional integrity?
- What happens if I lose my connection?
- How is it backed up, so I know how to get data back when it, inevitably, gets lost.
- How does it store data on disk, so I can organize my schema objects appropriately?
The Infrastructure DBA can perform their job more effectively by understanding the applications that use "their" database.
- Are some tables more critical than others?
- What load is expected from the application, including over time?
- Does the application rely on version-specific features?
- A knowledge of different SQL techniques can be useful when developers come to you for help in tuning, since much tuning happens in the application and not a magical FAST=TRUE database setting.
- Be able to explain to developers why bind variables are important to both security and performance.
When it comes to many of the questions above, each database is different. Oracle handles read consistency differently than MySQL, which handles it differently than SQL Server. There are application design implications here, and different performance tuning approaches for each. The DBA needs to understand how the database they service manages data internally, to some degree. Reading the documentation is often the best place to start.
No DBA can know everything, so an effective DBA will have resources in mind where he can get questions answered as needed. Remain teachable and open to learning something new every day.
- Documentation is the obvious place to start (did I mention that already? Yes. Good. Can't mention it enough.)
- Online support systems and forums from your database vendor.
- Select other Internet forums that you TRUST. They are not all good, and many are the blind leading the blind.
- Know local resources you can go to... peers, mentors, user groups.
Down to more technical details, the DBA MUST, absolutely, know backup and recovery strategies for their chosen database. Each product has different methods and semantics, so learn what is appropriate. There may be multiple strategies (hot backups vs. cold backups, vs. exports - which are not really a backup, but that's another post). Understand how your database backup relates to your operating system backup. (For example, that O/S backup of /oradata, in and of itself, is utterly useless if your database is running at the time.)
Always remember, an untested backup is no backup at all. If you are not confident in your ability to restore a backup, you might as well not be taking it. Know how to restore the entire database, as well as subsets of the data... individual data files, individual tables, individual tablespaces, point-in-time recovery.
Performance Tuning is an acquired skill, learned mostly through experience, often with bumps and bruises along the way. In Oracle, at least, there are many tools available. Experiment with each of the tools available, learn their strengths. Use these tools to measure performance in good times and bad, so you can tell the difference. In a performance "crisis" you will likely be facing a situation unlike others you have solved in the past. Be prepared to do critical analysis on the fly. If you are making a change to fix a performance problem, have evidence to support why THIS change is the right choice and not the first parameter tweak that came to mind.
Know how to read your system's log files. Know how to generate diagnostic traces, if your database-of-choice supports it. Practice (in a test database) if you have that luxury, by setting up situations you expect to be a problem, then observing the messages that show up in the logs.
Practice installing the database, especially if you inherited a system already up and running. Perform regular patches. Pay attention to the files and directories that are created and know their purpose. Know what processes are expected to be running and their typical load. To beat the proverbial dead horse, the documentation for your system may be very valuable here.
The most important non-technical skill for the DBA may be communication. Cultivate good working relationships with the development team, and with the system administrators (if that doesn't already include you). Present yourself as an accessible resource, rather than the surly gatekeeper, and you will find a much smoother experience in implementing changes. At the same time, the "surly gatekeeper" hat is sometimes important and necessary - know when it is appropriate in your organization.
As long as this post has been, I'm sure I missed several important areas that could be added. As they come to mind, look for more posts.
Labels:
administration,
core skills,
DBA
Subscribe to:
Posts (Atom)