2025年oracle+故障切换,CRS故障切换异常

oracle+故障切换,CRS故障切换异常本帖最后由 ftc007 于 2012 6 7 09 53 编辑 以上是这个过程的 CRS 日志 2012 06 05 18 06 55 023 开始关机 cssd 10293 CRS 1654 Clean up of CRSD resources finished successfully 2012 06 05 18 06 55 024

大家好,我是讯享网,很高兴认识大家。

本帖最后由 ftc007 于 2012-6-7 09:53 编辑

以上是这个过程的CRS日志:

2012-06-05 18:06:55.023    --开始关机

[cssd(10293)]CRS-1654:Clean up of CRSD resources finished successfully.

2012-06-05 18:06:55.024

[cssd(10293)]CRS-1655:CSSD on node oracle02 detected a problem and started to shutdown.  --关闭本节点的crs

2012-06-05 18:06:55.035

[/opt/app/11.2.0/grid/bin/orarootagent.bin(10867)]CRS-5822:Agent '/opt/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117

5f60efcaccfa40b26873a6a9cb266119.gif
讯享网 {0:1:4} in /opt/app/11.2.0/grid/log/oracle02/agent/crsd/orarootagent_root/orarootagent_root.log.

2012-06-05 18:06:55.034

[/opt/app/11.2.0/grid/bin/oraagent.bin(10864)]CRS-5822:Agent '/opt/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117

5f60efcaccfa40b26873a6a9cb266119.gif {0:2:6} in /opt/app/11.2.0/grid/log/oracle02/agent/crsd/oraagent_grid/oraagent_grid.log.

2012-06-05 18:06:55.117

[ohasd(9758)]CRS-2765:Resource 'ora.crsd' has failed on server 'oracle02'.

2012-06-05 18:06:55.195

[cssd(10293)]CRS-1660:The CSS daemon shutdown has completed

2012-06-05 18:06:55.512

[ohasd(9758)]CRS-2765:Resource 'ora.diskmon' has failed on server 'oracle02'.

2012-06-05 18:06:55.531

[ohasd(9758)]CRS-2765:Resource 'ora.evmd' has failed on server 'oracle02'.

2012-06-05 18:06:55.539

[ohasd(9758)]CRS-2765:Resource 'ora.ctssd' has failed on server 'oracle02'.

2012-06-05 18:06:56.154

[crsd(11278)]CRS-0805:Cluster Ready Service aborted due to failure to communicate with Cluster Synchronization Service with error [3]. Details at (:CRSD00109

5f60efcaccfa40b26873a6a9cb266119.gif in /opt/app/11.2.0/grid/log/oracle02/crsd/crsd.log.

2012-06-05 18:06:56.528

[ohasd(9758)]CRS-2765:Resource 'ora.cssd' has failed on server 'oracle02'.

2012-06-05 18:06:56.546

[ohasd(9758)]CRS-2765:Resource 'ora.crsd' has failed on server 'oracle02'.

2012-06-05 18:06:56.558

[ohasd(9758)]CRS-2765:Resource 'ora.cluster_interconnect.haip' has failed on server 'oracle02'.

2012-06-05 18:06:57.323

[ctssd(11289)]CRS-2402:The Cluster Time Synchronization Service aborted on host oracle02. Details at (:ctss_css_init1

5f60efcaccfa40b26873a6a9cb266119.gif in /opt/app/11.2.0/grid/log/oracle02/ctssd/octssd.log.

2012-06-05 18:06:58.279

[/opt/app/11.2.0/grid/bin/oraagent.bin(10199)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006

5f60efcaccfa40b26873a6a9cb266119.gif" in "/opt/app/11.2.0/grid/log/oracle02/agent/ohasd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:06:58.466

[ohasd(9758)]CRS-2765:Resource 'ora.asm' has failed on server 'oracle02'.

2012-06-05 18:07:06.870

[cssd(11333)]CRS-1713:CSSD daemon is started in clustered mode  --准备启动crs,在启动之后,它也需要重新检测各节点的健康状况,下面5行,就是这个过程,但比较久,不正常

2012-06-05 18:07:21.817

[ohasd(9758)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'oracle02'.

2012-06-05 18:07:22.548

[cssd(11333)]CRS-1707:Lease acquisition for node oracle02 number 2 completed

2012-06-05 18:07:23.872

[cssd(11333)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL3; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

2012-06-05 18:07:23.888

[cssd(11333)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL2; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

2012-06-05 18:07:23.898

[cssd(11333)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL1; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

^[[A^[[A2012-06-05 18:09:26.767

[cssd(11333)]CRS-1625:Node oracle01, number 1, was manually shut down     --节点2来到这里才意识到节点1崩溃,这个好像有点奇怪,其实主要是这里之前花的时间比较多,整个过程花了三分钟,后面都比较快

2012-06-05 18:09:30.670

[cssd(11333)]CRS-1601:CSSD Reconfiguration complete. Active nodes are oracle02 .

2012-06-05 18:10:30.793

[cssd(11333)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00011

5f60efcaccfa40b26873a6a9cb266119.gif in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log

2012-06-05 18:10:34.327

[cssd(11623)]CRS-1713:CSSD daemon is started in clustered mode

2012-06-05 18:10:36.708

[ohasd(9758)]CRS-2765:Resource 'ora.diskmon' has failed on server 'oracle02'.

2012-06-05 18:10:50.103

[cssd(11623)]CRS-1707:Lease acquisition for node oracle02 number 2 completed

2012-06-05 18:10:51.524

[cssd(11623)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL3; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

2012-06-05 18:10:51.532

[cssd(11623)]CRS-1625:Node oracle01, number 1, was manually shut down

2012-06-05 18:10:51.552

[cssd(11623)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL2; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

2012-06-05 18:10:51.576

[cssd(11623)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL1; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

2012-06-05 18:10:59.113

[cssd(11623)]CRS-1601:CSSD Reconfiguration complete. Active nodes are oracle02 .

2012-06-05 18:11:01.715

[ctssd(11714)]CRS-2407:The new Cluster Time Synchronization Service reference node is host oracle02.

2012-06-05 18:11:01.714

[ctssd(11714)]CRS-2401:The Cluster Time Synchronization Service started on host oracle02.

2012-06-05 18:11:02.428

[/opt/app/11.2.0/grid/bin/oraagent.bin(10199)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006

5f60efcaccfa40b26873a6a9cb266119.gif" in "/opt/app/11.2.0/grid/log/oracle02/agent/ohasd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:11:02.680

[/opt/app/11.2.0/grid/bin/oraagent.bin(10199)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006

5f60efcaccfa40b26873a6a9cb266119.gif" in "/opt/app/11.2.0/grid/log/oracle02/agent/ohasd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:11:22.096

[crsd(11870)]CRS-1012:The OCR service started on node oracle02.

2012-06-05 18:11:22.897

[evmd(11286)]CRS-1401:EVMD started on node oracle02.

2012-06-05 18:11:24.284

[crsd(11870)]CRS-1201:CRSD started on node oracle02.

2012-06-05 18:11:25.832

[/opt/app/11.2.0/grid/bin/oraagent.bin(11981)]CRS-5016:Process "/opt/app/11.2.0/grid/bin/lsnrctl" spawned by agent "/opt/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010

5f60efcaccfa40b26873a6a9cb266119.gif" in "/opt/app/11.2.0/grid/log/oracle02/agent/crsd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:11:25.842

[/opt/app/11.2.0/grid/bin/oraagent.bin(11981)]CRS-5016:Process "/opt/app/11.2.0/grid/opmn/bin/onsctli" spawned by agent "/opt/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010

5f60efcaccfa40b26873a6a9cb266119.gif" in "/opt/app/11.2.0/grid/log/oracle02/agent/crsd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:11:27.438

[/opt/app/11.2.0/grid/bin/oraagent.bin(11981)]CRS-5016:Process "/opt/app/11.2.0/grid/bin/lsnrctl" spawned by agent "/opt/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/opt/app/11.2.0/grid/log/oracle02/agent/crsd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:11:27.575

[crsd(11870)]CRS-2772:Server 'oracle02' has been assigned to pool 'Free'.

从日志来分析,感觉好像是共享存储的切换比较慢造成的,以下是ocssd的日志信息:

2012-06-05 18:09:25.930: [    CSSD][]clssgmDiscEndpcl: gipcDestroy 0x1faa

2012-06-05 18:09:26.151: [    CSSD][]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0

2012-06-05 18:09:26.767: [    CSSD][]clssnmvDHBValidateNCopy: node 1, oracle01, has a disk HB, but no network HB, DHB has

rcfg , wrtcnt, 26905, LATS , lastSeqNo 26900, uniqueness , timestamp /

2012-06-05 18:09:26.767: [    CSSD][]clssnmReadDskHeartbeat:manual shutdown of nodename oracle01, nodenum 1 epoch

66 msec

2012-06-05 18:09:27.153: [    CSSD][]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0

2012-06-05 18:09:28.155: [    CSSD][]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0

2012-06-05 18:09:29.157: [    CSSD][]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0

2012-06-05 18:09:29.661: [    CSSD][]clssnmSendingThread: sending join msg to all nodes

2012-06-05 18:09:29.661: [    CSSD][]clssnmSendingThread: sent 4 join msgs to all nodes

2012-06-05 18:09:29.940: [    CSSD][]clssscSelect: cookie accept request 0x2aaaac024f00

2012-06-05 18:09:29.940: [    CSSD][]clssgmAllocProc: (0x1cc455c0) allocated

2012-06-05 18:09:29.941: [    CSSD][]clssgmClientConnectMsg: properties of cmProc 0x1cc455c0 - 1,2,3,4,5

2012-06-05 18:09:29.941: [    CSSD][]clssgmClientConnectMsg: Connect from con(0x2039) proc(0x1cc455c0) pid(11200) version

11:2:1:4, properties: 1,2,3,4,5

2012-06-05 18:09:29.941: [    CSSD][]clssgmClientConnectMsg: msg flags 0x0000

2012-06-05 18:09:29.942: [    CSSD][]clssscSelect: cookie accept request 0x1cc455c0

2012-06-05 18:09:29.942: [    CSSD][]clssscevtypSHRCON: getting client with cmproc 0x1cc455c0

2012-06-05 18:09:29.942: [    CSSD][]clssgmRegisterClient: proc(3/0x1cc455c0), client(1/0x1d)

2012-06-05 18:09:29.943: [    CSSD][]clssgmJoinGrock: global grock CRF- new client 0x1d with con 0x2068, requested n

um -1, flags 0x

2012-06-05 18:09:29.943: [    CSSD][]clssgmJoinGrock: ignoring grock join for client not requiring fencing until group inf

ormation has been received from the master; group name CRF-, member number -1, flags 0x

2012-06-05 18:09:29.943: [    CSSD][]clssgmDiscEndpcl: gipcDestroy 0x2068

2012-06-05 18:09:29.943: [    CSSD][]clssgmDeadProc: proc 0x1cc455c0

2012-06-05 18:09:29.944: [    CSSD][]clssgmDestroyProc: cleaning up proc(0x1cc455c0) con(0x2039) skgpid  ospid 11200 with

0 clients, refcount 0

2012-06-05 18:09:29.944: [    CSSD][]clssgmDiscEndpcl: gipcDestroy 0x2039

2012-06-05 18:09:30.159: [    CSSD][]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0

2012-06-05 18:09:30.662: [    CSSD][]clssnmRcfgMgrThread: Local Join

2012-06-05 18:09:30.662: [    CSSD][]clssnmLocalJoinEvent: begin on node(2), waittime

2012-06-05 18:09:30.662: [    CSSD][]clssnmLocalJoinEvent: set curtime () for my node

2012-06-05 18:09:30.662: [    CSSD][]clssnmLocalJoinEvent: scanning 32 nodes

2012-06-05 18:09:30.662: [    CSSD][]clssnmLocalJoinEvent: Node oracle01, number 1, was shut down

2012-06-05 18:09:30.664: [    CSSD][]clssnmLocalJoinEvent: Starting initial cluster reconfig

2012-06-05 18:09:30.664: [    CSSD][]clssnmDoSyncUpdate: Initiating sync 0

2012-06-05 18:09:30.664: [    CSSD][]clssscCompareSwapEventValue: changed NMReconfigInProgress  val 2, from -1, changes 1

2012-06-05 18:09:30.664: [    CSSD][]clssnmDoSyncUpdate: local disk timeout set to ms, remote disk timeout set to 2

00000

2012-06-05 18:09:30.664: [    CSSD][]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will ta

ke effect when the sync is completed.

从以上描述来看,节点2等待了193秒后才找到共享磁盘,然后才成功把CRS服务起来,感觉可能是这个方向有异常,大家意见怎样?楼上的朋友怎么看?

小讯
上一篇 2025-01-05 23:39
下一篇 2025-01-09 13:14

相关推荐

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容,请联系我们,一经查实,本站将立刻删除。
如需转载请保留出处:https://51itzy.com/kjqy/54997.html