星期四, 6月 14, 2012

[轉載] Oracle RAC Q&A

http://salaic-dbaoracle.blogspot.tw/2009/04/oracle-rac-q.html

Oracle RAC Q&A

1. Why Node Eviction happens on Oracle RAC ?

Oracle Clusterware evicts the node when following condition occur:
- Node is not pinging via the network hearbeat
- Node is not pinging the Voting Disk
- Node is hung or busy and is unable to perform the above two tasks

Most cases the error cause is written to disk. If no error following the Metalink note: ID 559365.1 touse Diagwait option which will gives 10 seconds for the node to write logs to error log file.

#crsctl set css diagwait 13 -force
#crsctl get css diagwait
#crsctl check crs
#crsctl unset css diagwait -f 



1a. What is Miscount(MC) in Oracle RAC ?

The Cluster Synchronization Service (CSS) on RAC has Miscount parameter. This value represents maximum time, in seconds, that a network heartbeat can be missed before entering into a cluster reconfiguration to evict the node. The default value is 30 seconds (Linux 60 seconds in 10g, 30 sec in 11g).

2. What is the use of CSS Heartbeat Mechanism in Oracle RAC ?

The CSS of the Oracle Clusterware maintains two heartbeat mechanisms
1. The disk heartbeat to the voting device and
2. The network heartbeat across the interconnect (This establish and confirm valid node membership in the cluster).

Both of these heartbeat mechanisms have an associated timeout value. The disk heartbeat has an internal i/o timeout interval (DTO Disk TimeOut), in seconds, where an i/o to the voting disk must complete. The misscount parameter (MC), as stated above, is the maximum time, in seconds, that a network heartbeat can be missed. The disk heartbeat i/o timeout interval is directly related to the misscount parameter setting. The Disk TimeOut(DTO) = Miscount(MC) - 15 secconds (some versions are different).

Metalink Note: 294430.1

3. What happens if latencies to voting disks are longer ?

If I/O latencies to the voting disk are greater than the default Disk TimeOut (DTO), then the cluster may experince CSS node evictions.

4. What is CSS miscount ?

The CSS miscount represents the maximum seconds the network hearbeat can be missed before entering into cluster reconfiguration and evict the node. The default CSS miscount is 30 seconds. (only for 10g Linux it is 60 secods).

4a. How to change the CSS miscount default value ?

1) Shut down CRS on all but one node. For exact steps use Note 309542.1
2) Execute crsctl as root to modify the misscount:
$ORA_CRS_HOME/bin/crsctl set css misscount
 
where
 is the maximum i/o latency to the voting disk +1 second
3) Reboot the node where adjustment was made
4) Start all other nodes shutdown in step 1

Metalink Note: 284752.1

5. How to start and stop CRS ?

Note: Typically the Oracle clusterware starts up automatically during startup.

10gR1 and R2
------------
cd /etc/init.d
init.crs stop
init.crs start

To disable crs to start during next reboot. It will not bring down running crs.
init.crs enable
init.crs disable

10gR2 and higher versions Only
------------------------------
Start Oracle Clusterware
crsctl start crs

Stop Oracle Clusterware
crsctl stop crs

6. How to move regular DB to an ASM disk group ?

The following are the steps involved in moving regular db files to ASM disk

group.

Assume:
1. Oracle RAC instance is up already
2. DB name to be moved PROD
3. RAC db and normal DB both are in same instance.

1. Install and bring up Oracle RAC instance and ASM disk group.
2. Comment control file location in the DB you want to move and add ASM

disk name for control_file.
ex. control_file="+DATA_GRP"

3. SQL> startup nomount
SQL> Show parameter => Control_files will show new disk grp

4. Use RMAN to move control file from regular disk to ASM using restore

command.
rman
rman> connect target
rman> restore controlfile from '/u01/oracle/PROD/cntrl01.ctl';

5. Verify using asmcmd
asmcmd> cd DSK_GRP/DATA_GRP/PROD
asmcmd> ls => you can see new controlfile under PROD directory.

6. Now mount the DB
sqlplus "/as sysdba"
sql> alter database mount;

7. Now use RMAN to move the data files.
rman
connect target (connected to PROD)
rman> backup as copy database format '+DATA_GRP';
Note: you can use asmcmd to monitor the data file movements to ASM.

8. rman> swith database to copy;

9. sqlplus "/as sysdba" ; alter database open;

10. select * from v$datafile;
select * from v$tempfile;
select * from v$controlfile;
select * from v$logfile;

11. sql> alter database drop logfile '/u01/.../redo01.log';
alter database add logfile '+DATA_GRP';

Note: Repeat same step for all log files except current used logfile.
select * from v$log to find which one is current

12. alter system switch logfile;
drop the first one which was being used.

13. Now vi init.ora and put full path for controlfile for DB to start

properly.
*.control_file="+DATA_GRP/PROD/controlfile/current.333.433.3333"

13. vi init.ora => Change location of arc to ASM
*.log_archive_dest_1='LOCATION=+DATA_GRP/PROD' => if you omit PROD it

will not work properly.

14. alter system switch logfile; => now the new arc will go to ASM.

15. END

------------------

3. What is a NIC card and HBA card.

Oracle RAC requires a NIC or HBA card which enables the computer to talk to network
 
or to a storage subsystem.

There are diffrent speeds of BHA card: 1Gbit/S, 2GBit/S, 4, 8, 10, 20 GBits/s

HBA has a unique World Wide Name (WWN),
 
which is similar to an Ethernet MAC address in that it uses an Organizationally
 
Unique Identifier (OUI) assigned by the IEEE.

4. What is a TPS.

http://www.dba-oracle.com/m_transactions_per_second.htm

==

5. What is the use of crs_getperm command ?
Used to get permission information.

crs_getperm
Usage: crs_getperm resource_name [-u user|-g group] [-q]

crs_getperm ora.dudb.dudb1.inst
Name: ora.dudb.dudb1.inst
owner:oracle:rwx,pgrp:oinstall:rwx,other::r--,

==

6. what is the use of crs_profile ?

Used to create, validate, delete and update a profile for RAC.


==

7. Where will you check for RAC log files?

==

8. What is OCFS ?

===

8a. What is OCR ?

- Is a binary file used to store configuration information ans status
 
information. Its like windows registry.
- Maintained by CRS Daemon.
- Can be mirrered in 10R2.
- Include config information of DB, ASM, Services, VIP, Listener and etc


8b. What is Voting Disk?
-Used stores node membership information.
-Used by CSS during split-brain synarios. (two nodes trying to do same task).
-Used to determine RAC instance membership.
-
 

8c. What is VIP?
- All application connect using VIP
-
 

==

9. What is Oracle ClusterWare ?

a. It is franework which contains application modeling logic.
Invokes application aware agents.
 
Performs resource recovery. Whan a node goes down, Clusterware framework
recovers the application by relocationg the resources to a live node.

This can be done for non Oracle applications as well. For ex. xclock.

b. Clusterware also hosts OCR cache.

The Oracle Clusterware requires two clusterware components:
 
a voting disk to record node membership information and the
 
Oracle Cluster Registry/Repository (OCR) to record cluster configuration information.
 
The voting disk and the OCR must reside on shared storage.
 


==

10. What is a resource ?

A resource is a Oracle Clusterware manager application.
'Profile attributes' for a resource is stored in Oracle Cluster Registry.
 

11. What is OCR?
Oracle Cluster Registry or OCR is a component of Oracle Clusterware Framework.
It stores profile attibute information.
Oracle RAC consists of series of resources.
 

Other applications can also be treated as a resource.
OCR contains information pertaining to instance-to-node mapping
You can't have more than two OCRs.


11. How to register a resource ?

a. Use crs_profile to create .CAP file with configuration details.
b. use crs_register to read .CAP file and update the OCR.
c. Resources can have dependencies. It will start in order and failover as a single unit.

12. What does crs_start / crs_stop does ?

Reads config info from OCR and calls agent with command 'start'.
The agents (can be user written) actully stops the resource.

crs_start => read OCR config info => calls 'Control Agent' with command start. => Control agent stops the resource.

crs_stop => read OCR config info => call 'Control agent' with 'stop' => control agent stops app.

==

13. Question: Using the crs_start command to start/stop services.

As per Oracle documentation.....
2) Oracle® Database Oracle Clusterware and Oracle Real Application
Clusters Administration and Deployment Guide
10g Release 2 (10.2)
Part Number B14197-03
Page 260 says

"Note: Do not use the Oracle Clusterware commands crs_register, crs_profile,
crs_start or crs_stop on resources with names beginning with the prefix "ora"
unless either Oracle Support asks you to, or unless Oracle has certified you as
described in http://metalink.oracle.com. Server Control (SRVCTL) is the correct
utility to use on Oracle resources. You can create resources that depend on
resources that Oracle has defined. You can also use the Oracle Clusterware commands to
inspect the configuration and status."

==


14. What is the difference between Oracle Clusterware and CRS ?

Oracle Clusterware is formerly known as Cluster Ready Services (CRS). It is an integrated cluster management solution that enables you to link multiple servers so that they function as a single system or cluster. The Oracle Clusterware simplifies the infrastructure required for RAC because it is integrated with the Oracle Database. In addition, Oracle Clusterware is also available for use with single-instance databases and applications that you deploy on clusters

Note: The commands stating with crs_ are still valid and same.

==

15. What is 'Split brain Syndrome' ?

The Oracle Clusterware manages node membership and 'prevents' 'split brain syndrome' in which two or more instances attempt to control the database. This can occur in cases where there is a break in communication between nodes through the interconnect.

16. What is Oracle recomendation for interconnect ?

Oracle recommends that you configure a redundant interconnect to prevent the interconnect from being a single point of failure.
 

Oracle also recommends that you use User Datagram Protocol (UDP) on a Gigabit Ethernet for your cluster interconnect.
 
Crossover cables are not supported for use with Oracle Clusterware or RAC databases.

17. List the commands used to manage RAC ?

crs_profile

crs_register

crs_relocate

crs_getperm

crs_setperm

crs_stat

srvctl
 

--
crsctl
 

$crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

$crsctl check cssd
CSS appears healthy

$crsctl check evmd
EVM appears healthy

crsctl add css votedisk
 - adds a new voting disk
crsctl delete css votedisk
 - removes a voting disk
crsctl enable crs - enables startup for all CRS daemons
crsctl disable crs - disables startup for all CRS daemons
crsctl start crs - starts all CRS daemons.
crsctl stop crs - stops all CRS daemons. Stops CRS resources

$ crsctl query crs activeversion
CRS active version on the cluster is [10.2.0.1.0]

--
ocrdump


--
ocrconfig

dsudsbs1:oracle$ ocrconfig -showbackup
dsudsbs1 2009/11/25 19:42:50 /opt/crs/oracle/product/10.2/app/cdata/crs
dsudsbs1 2009/11/25 15:42:49 /opt/crs/oracle/product/10.2/app/cdata/crs
dsudsbs1 2009/11/25 11:42:49 /opt/crs/oracle/product/10.2/app/cdata/crs
dsudsbs1 2009/11/24 19:42:47 /opt/crs/oracle/product/10.2/app/cdata/crs
dsudsbs1 2009/11/12 19:42:12 /opt/crs/oracle/product/10.2/app/cdata/crs

ocrconfig -repair ocr
 
ocrconfig -replace
ocrconfig -export/-import
ocrconfig -upgrade
--
ocrcheck - no param needed.

$ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 0
Used space (kbytes) : 4588
Available space (kbytes) : 4294962708
ID : 1014742862
Device/File Name : /dev/rdsk/c5t0d3
Device/File integrity check succeeded
Device/File Name : /dev/rdsk/c5t0d4
Device/File integrity check succeeded

Cluster registry integrity check succeeded
--




18. How to take backup of Voting disk ?.
 

Use dd command to backup.

dd if=voting_disk_file of=backup_vt_file

In windoes use ocopy.
 

To add and remove voting disks use crsctl:

crsctl add css voting_disk_path

crsctl delete css voting_disk_path

if your cluster is down use force option

crsctl add css voting_disk_path -force

==

19. How to find location of voting disk ?

option 1:

crsctl query css votedisk
0. 0 /dev/rdsk/c5t0d5
1. 0 /dev/rdsk/c5t0d6
2. 0 /dev/rdsk/c5t0d7

located 3 votedisk(s).

option 2:
take a ocrdump.
ocrdump -stdout -keyname SYSTEM.css.diskfile


20. What is CRS?

21. What are the log file locations for RAC ?

cd $ORACLE_HOME/log//client
-- when you execute command like oifcfg, ocrconfig and etc
-- a log file will be created here.


cd $ORACLE_HOME/log//crsd

cd $ORACLE_HOME/log//racg


==
22. How to backup OCR ?

Oracle Cluster Registry (OCR) and recovering it. Oracle Clusterware automatically creates OCR backups every four hours and it always retains the last three backup copies of the OCR. The CRSD process that creates the backups also creates and retains an OCR backup for each full day and then at the end of a week a complete backup for the week. So there is a robust backup taking place in the background. And you guessed it right; you cannot alter the backup frequencies. This is meant to protect you, the DBA, so that you can copy these generated backup files at least once daily to a different device from where the primary OCR resides. These files are located at %CRS_home/cdata/my_cluster.
 


==
23. How to find location of OCR?

ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 0
Used space (kbytes) : 4588
Available space (kbytes) : 4294962708
ID : 1014742862
Device/File Name : /dev/rdsk/c5t0d3
Device/File integrity check succeeded
Device/File Name : /dev/rdsk/c5t0d4
Device/File integrity check succeeded

Cluster registry integrity check succeeded


24. How to restore OCR file if currupted ?

Do the following to restore our OCR on Unix/Linux Systems.

To show the backups, type the commands ocrconfig
showbackup
 
Check the contents by doing ocrdump -backupfile my_file
 
Go to bin and stop the CRS. crs stop on all nodes.
 
Perform the restore ocrconfig
restore my_file
 
Restart the nodes crs start
 
We have spoken and seen the CVU (Cluster Verification Utility) play a crucial role during installation in our RAC on VMware Series. Check the OCRs integrity. Get a verbose output of all of the nodes by doing this: cluvfy comp ocr –n all -verbose
 

==

25. How to compare all nodes with cluvfy?

cluvfy comp ocr -n all [-verbose]
 


oracle$ cluvfy comp ocr -n all

Verifying OCR integrity
Checking OCR integrity...

Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.

Uniqueness check for OCR device passed.

Checking the version of OCR...
OCR of correct Version "2" exists.

Checking data integrity of OCR...
Data integrity check for OCR passed.

OCR integrity check passed.

Verification of OCR integrity was successful.
dsudsbs1:oracle$
==

26. How to manage ASM?

Administering ASM Instances with SRVCTL in RAC
Use the following command to add configuration information to an existing ASM instance:

srvctl add asm -n mynode_name -i myasm_instance_name -o myoracle_home
If, however, you choose not to add the –I option, then the changes are propogated throughout the entire ASM instance pool.

To remove an ASM instance, use the following syntax:

srvctl remove asm -n mynode_name [-i myasm_instance_name]
In order to enable an ASM instance, use the following syntax:

srvctl enable asm -n mynode_name [-i ] myasm_instance_name
In order to disable an ASM instance use the following syntax:

srvctl disable asm -n mynode_name [-i myasm_instance_name]
Note that you can also use the SRVCTL utility to start, stop, and get the status of an ASM instance. See the examples below.

To start an ASM instance, do the following:

srvctl start asm -n mynode_name [-i myasm_instance_name] [-o start_options] [-c
 | -q]
To stop an ASM instance, type the following syntax:

srvctl stop asm -n mynode_name [-i myasm_instance_name] [-o stop_options] [-c
 | -q]
To list the configuration of an ASM instance do the following:

srvctl config asm -n mynode_name
 
To get the status of an ASM instance, see the following syntax:

srvctl status asm -n mynode_name

==

27. How to start and stop RAC ?

Starting Up and Shutting Down with SRVCTL
We have covered SRVCTL before, so we'll do a quick syntax check here, to start an instance:

srvctl start instance -d mydb -i "myinstance_list" [-o start_options] [-c connect_str | -q]

To stop, do the following:

srvctl stop instance -d mydb -i " myinstance_list" [-o stop_options] [-c connect_str | -q]

To start and stop the entire RAC cluster database, meaning all of the instances, you will do the following from your SRVCTL in the command line:

srvctl start database -d mydb [-o start_options] [-c connect_str | -q]

srvctl stop database -d mydb [-o stop_options] [-c connect_str | -q]

There are several options and we will look at all of them in upcoming articles in RAC administration

==

28 . How to take ocrdump?

login as root.
type ocrdump. It create a file as OCRDUMPFILE

vi to see the ocrdump.

if you type again you will get error:
# ocrdump
PROT-303: Dump file already exists [OCRDUMPFILE]

==

 

 

沒有留言:

LinkWithin-相關文件

Related Posts Plugin for WordPress, Blogger...