2025年oracle+故障切换,CRS故障切换异常

大家好，我是讯享网，很高兴认识大家。

本帖最后由 ftc007 于 2012-6-7 09:53 编辑

以上是这个过程的CRS日志：

2012-06-05 18:06:55.023 --开始关机

[cssd(10293)]CRS-1654:Clean up of CRSD resources finished successfully.

2012-06-05 18:06:55.024

[cssd(10293)]CRS-1655:CSSD on node oracle02 detected a problem and started to shutdown. --关闭本节点的crs

2012-06-05 18:06:55.035

[/opt/app/11.2.0/grid/bin/orarootagent.bin(10867)]CRS-5822:Agent '/opt/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117

讯享网 {0:1:4} in /opt/app/11.2.0/grid/log/oracle02/agent/crsd/orarootagent_root/orarootagent_root.log.

2012-06-05 18:06:55.034

[/opt/app/11.2.0/grid/bin/oraagent.bin(10864)]CRS-5822:Agent '/opt/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117

{0:2:6} in /opt/app/11.2.0/grid/log/oracle02/agent/crsd/oraagent_grid/oraagent_grid.log.

2012-06-05 18:06:55.117

[ohasd(9758)]CRS-2765:Resource 'ora.crsd' has failed on server 'oracle02'.

2012-06-05 18:06:55.195

[cssd(10293)]CRS-1660:The CSS daemon shutdown has completed

2012-06-05 18:06:55.512

[ohasd(9758)]CRS-2765:Resource 'ora.diskmon' has failed on server 'oracle02'.

2012-06-05 18:06:55.531

[ohasd(9758)]CRS-2765:Resource 'ora.evmd' has failed on server 'oracle02'.

2012-06-05 18:06:55.539

[ohasd(9758)]CRS-2765:Resource 'ora.ctssd' has failed on server 'oracle02'.

2012-06-05 18:06:56.154

[crsd(11278)]CRS-0805:Cluster Ready Service aborted due to failure to communicate with Cluster Synchronization Service with error [3]. Details at (:CRSD00109

in /opt/app/11.2.0/grid/log/oracle02/crsd/crsd.log.

2012-06-05 18:06:56.528

[ohasd(9758)]CRS-2765:Resource 'ora.cssd' has failed on server 'oracle02'.

2012-06-05 18:06:56.546

[ohasd(9758)]CRS-2765:Resource 'ora.crsd' has failed on server 'oracle02'.

2012-06-05 18:06:56.558

[ohasd(9758)]CRS-2765:Resource 'ora.cluster_interconnect.haip' has failed on server 'oracle02'.

2012-06-05 18:06:57.323

[ctssd(11289)]CRS-2402:The Cluster Time Synchronization Service aborted on host oracle02. Details at (:ctss_css_init1

in /opt/app/11.2.0/grid/log/oracle02/ctssd/octssd.log.

2012-06-05 18:06:58.279

[/opt/app/11.2.0/grid/bin/oraagent.bin(10199)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006

" in "/opt/app/11.2.0/grid/log/oracle02/agent/ohasd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:06:58.466

[ohasd(9758)]CRS-2765:Resource 'ora.asm' has failed on server 'oracle02'.

2012-06-05 18:07:06.870

[cssd(11333)]CRS-1713:CSSD daemon is started in clustered mode --准备启动crs，在启动之后，它也需要重新检测各节点的健康状况，下面5行，就是这个过程，但比较久，不正常

2012-06-05 18:07:21.817

[ohasd(9758)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'oracle02'.

2012-06-05 18:07:22.548

[cssd(11333)]CRS-1707:Lease acquisition for node oracle02 number 2 completed

2012-06-05 18:07:23.872

[cssd(11333)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL3; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

2012-06-05 18:07:23.888

[cssd(11333)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL2; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

2012-06-05 18:07:23.898

[cssd(11333)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL1; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

^[[A^[[A2012-06-05 18:09:26.767

[cssd(11333)]CRS-1625:Node oracle01, number 1, was manually shut down --节点2来到这里才意识到节点1崩溃，这个好像有点奇怪，其实主要是这里之前花的时间比较多，整个过程花了三分钟，后面都比较快

2012-06-05 18:09:30.670

[cssd(11333)]CRS-1601:CSSD Reconfiguration complete. Active nodes are oracle02 .

2012-06-05 18:10:30.793

[cssd(11333)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00011

in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log

2012-06-05 18:10:34.327

[cssd(11623)]CRS-1713:CSSD daemon is started in clustered mode

2012-06-05 18:10:36.708

[ohasd(9758)]CRS-2765:Resource 'ora.diskmon' has failed on server 'oracle02'.

2012-06-05 18:10:50.103

[cssd(11623)]CRS-1707:Lease acquisition for node oracle02 number 2 completed

2012-06-05 18:10:51.524

[cssd(11623)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL3; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

2012-06-05 18:10:51.532

[cssd(11623)]CRS-1625:Node oracle01, number 1, was manually shut down

2012-06-05 18:10:51.552

[cssd(11623)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL2; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

2012-06-05 18:10:51.576

[cssd(11623)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL1; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

2012-06-05 18:10:59.113

[cssd(11623)]CRS-1601:CSSD Reconfiguration complete. Active nodes are oracle02 .

2012-06-05 18:11:01.715

[ctssd(11714)]CRS-2407:The new Cluster Time Synchronization Service reference node is host oracle02.

2012-06-05 18:11:01.714

[ctssd(11714)]CRS-2401:The Cluster Time Synchronization Service started on host oracle02.

2012-06-05 18:11:02.428

[/opt/app/11.2.0/grid/bin/oraagent.bin(10199)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006

" in "/opt/app/11.2.0/grid/log/oracle02/agent/ohasd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:11:02.680

[/opt/app/11.2.0/grid/bin/oraagent.bin(10199)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006

" in "/opt/app/11.2.0/grid/log/oracle02/agent/ohasd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:11:22.096

[crsd(11870)]CRS-1012:The OCR service started on node oracle02.

2012-06-05 18:11:22.897

[evmd(11286)]CRS-1401:EVMD started on node oracle02.

2012-06-05 18:11:24.284

[crsd(11870)]CRS-1201:CRSD started on node oracle02.

2012-06-05 18:11:25.832

[/opt/app/11.2.0/grid/bin/oraagent.bin(11981)]CRS-5016:Process "/opt/app/11.2.0/grid/bin/lsnrctl" spawned by agent "/opt/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010

" in "/opt/app/11.2.0/grid/log/oracle02/agent/crsd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:11:25.842

[/opt/app/11.2.0/grid/bin/oraagent.bin(11981)]CRS-5016:Process "/opt/app/11.2.0/grid/opmn/bin/onsctli" spawned by agent "/opt/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010

" in "/opt/app/11.2.0/grid/log/oracle02/agent/crsd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:11:27.438

2012-06-05 18:11:27.575

[crsd(11870)]CRS-2772:Server 'oracle02' has been assigned to pool 'Free'.

从日志来分析，感觉好像是共享存储的切换比较慢造成的，以下是ocssd的日志信息：

2012-06-05 18:09:25.930: [ CSSD][]clssgmDiscEndpcl: gipcDestroy 0x1faa

2012-06-05 18:09:26.151: [ CSSD][]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

2012-06-05 18:09:26.767: [ CSSD][]clssnmvDHBValidateNCopy: node 1, oracle01, has a disk HB, but no network HB, DHB has

rcfg , wrtcnt, 26905, LATS , lastSeqNo 26900, uniqueness , timestamp /

2012-06-05 18:09:26.767: [ CSSD][]clssnmReadDskHeartbeat:manual shutdown of nodename oracle01, nodenum 1 epoch

66 msec

2012-06-05 18:09:27.153: [ CSSD][]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

2012-06-05 18:09:28.155: [ CSSD][]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

2012-06-05 18:09:29.157: [ CSSD][]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

2012-06-05 18:09:29.661: [ CSSD][]clssnmSendingThread: sending join msg to all nodes

2012-06-05 18:09:29.661: [ CSSD][]clssnmSendingThread: sent 4 join msgs to all nodes

2012-06-05 18:09:29.940: [ CSSD][]clssscSelect: cookie accept request 0x2aaaac024f00

2012-06-05 18:09:29.940: [ CSSD][]clssgmAllocProc: (0x1cc455c0) allocated

2012-06-05 18:09:29.941: [ CSSD][]clssgmClientConnectMsg: properties of cmProc 0x1cc455c0 - 1,2,3,4,5

2012-06-05 18:09:29.941: [ CSSD][]clssgmClientConnectMsg: Connect from con(0x2039) proc(0x1cc455c0) pid(11200) version

11:2:1:4, properties: 1,2,3,4,5

2012-06-05 18:09:29.941: [ CSSD][]clssgmClientConnectMsg: msg flags 0x0000

2012-06-05 18:09:29.942: [ CSSD][]clssscSelect: cookie accept request 0x1cc455c0

2012-06-05 18:09:29.942: [ CSSD][]clssscevtypSHRCON: getting client with cmproc 0x1cc455c0

2012-06-05 18:09:29.942: [ CSSD][]clssgmRegisterClient: proc(3/0x1cc455c0), client(1/0x1d)

2012-06-05 18:09:29.943: [ CSSD][]clssgmJoinGrock: global grock CRF- new client 0x1d with con 0x2068, requested n

um -1, flags 0x

2012-06-05 18:09:29.943: [ CSSD][]clssgmJoinGrock: ignoring grock join for client not requiring fencing until group inf

ormation has been received from the master; group name CRF-, member number -1, flags 0x

2012-06-05 18:09:29.943: [ CSSD][]clssgmDiscEndpcl: gipcDestroy 0x2068

2012-06-05 18:09:29.943: [ CSSD][]clssgmDeadProc: proc 0x1cc455c0

2012-06-05 18:09:29.944: [ CSSD][]clssgmDestroyProc: cleaning up proc(0x1cc455c0) con(0x2039) skgpid ospid 11200 with

0 clients, refcount 0

2012-06-05 18:09:29.944: [ CSSD][]clssgmDiscEndpcl: gipcDestroy 0x2039

2012-06-05 18:09:30.159: [ CSSD][]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

2012-06-05 18:09:30.662: [ CSSD][]clssnmRcfgMgrThread: Local Join

2012-06-05 18:09:30.662: [ CSSD][]clssnmLocalJoinEvent: begin on node(2), waittime

2012-06-05 18:09:30.662: [ CSSD][]clssnmLocalJoinEvent: set curtime () for my node

2012-06-05 18:09:30.662: [ CSSD][]clssnmLocalJoinEvent: scanning 32 nodes

2012-06-05 18:09:30.662: [ CSSD][]clssnmLocalJoinEvent: Node oracle01, number 1, was shut down

2012-06-05 18:09:30.664: [ CSSD][]clssnmLocalJoinEvent: Starting initial cluster reconfig

2012-06-05 18:09:30.664: [ CSSD][]clssnmDoSyncUpdate: Initiating sync 0

2012-06-05 18:09:30.664: [ CSSD][]clssscCompareSwapEventValue: changed NMReconfigInProgress val 2, from -1, changes 1

2012-06-05 18:09:30.664: [ CSSD][]clssnmDoSyncUpdate: local disk timeout set to ms, remote disk timeout set to 2

00000

2012-06-05 18:09:30.664: [ CSSD][]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will ta

ke effect when the sync is completed.

从以上描述来看，节点2等待了193秒后才找到共享磁盘，然后才成功把CRS服务起来，感觉可能是这个方向有异常，大家意见怎样？楼上的朋友怎么看？

2025年oracle+故障切换,CRS故障切换异常

相关推荐