I am hoping someone can give me pointers as to where I'm going wrong with clustering 3 servers with MySQL Cluster 7.1 with multiple management nodes.
Currently, the cluster works perfectly with one management node. This is the setup:
- First server runs only an instance of ndb_mgmd (192.168.66.114)
- Second server runs an instance of ndbd and mysqld (192.168.66.2)
- Third server runs an instance of ndbd and mysqld (192.168.66.113)
I want to introduce a second management node into the cluster. I have exactly the same config.ini for both managements servers. Here it is:
[NDBD DEFAULT]
NoOfReplicas=2
[MYSQLD DEFAULT]
[NDB_MGMD DEFAULT]
PortNumber=1186
datadir=c:/Progra~1/mysql-cluster-gpl-7.1.3-win32
LogDestination=FILE:filename=c:/Progra~1/mysql-cluster-gpl-7.1.3-win32/clusterlog.log
[TCP DEFAULT]
# Management Server
[NDB_MGMD]
Id=1
HostName=192.168.66.114
ArbitrationRank=1
[ND开发者_StackOverflow中文版B_MGMD]
Id=6
HostName=192.168.66.2
ArbitrationRank=2
# Storage Engines
[NDBD]
Id=2
HostName=192.168.66.2
DataDir= D:/AppData/ndb-7.1.3
[NDBD]
Id=3
HostName=192.168.66.113
DataDir= D:/AppData/ndb-7.1.3
[MYSQLD]
Id=4
HostName=192.168.66.2
[MYSQLD]
Id=5
HostName=192.168.66.113
When I start the ndb_mgmd instances on both servers and issue a show
command in ndb_mgm
, on the first management server I see that it's started:
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2 @192.168.66.2 (mysql-5.1.44 ndb-7.1.3, Nodegroup: 0, Master)
id=3 @192.168.66.113 (mysql-5.1.44 ndb-7.1.3, Nodegroup: 0)
[ndb_mgmd(MGM)] 2 node(s)
id=1 @192.168.66.114 (mysql-5.1.44 ndb-7.1.3)
id=6 (not connected, accepting connect from 192.168.66.2)
[mysqld(API)] 2 node(s)
id=4 @192.168.66.2 (mysql-5.1.44 ndb-7.1.3)
id=5 @192.168.66.113 (mysql-5.1.44 ndb-7.1.3)
ndb_mgm>
I am yet to start the second management instance on the second management server, so the following lines is perfectly OK (from above ndb_mgm output):
id=6 (not connected, accepting connect from 192.168.66.2)
Then, I go to the second management server (192.168.66.2), and start ndb_mgmd. After starting it, I issue a show
command against it:
ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2 (not connected, accepting connect from 192.168.66.2)
id=3 (not connected, accepting connect from 192.168.66.113)
[ndb_mgmd(MGM)] 2 node(s)
id=1 (not connected, accepting connect from 192.168.66.114)
id=6 @192.168.66.2 (mysql-5.1.44 ndb-7.1.3)
[mysqld(API)] 2 node(s)
id=4 (not connected, accepting connect from 192.168.66.2)
id=5 (not connected, accepting connect from 192.168.66.113)
ndb_mgm>
Instead of listing both management nodes as connected, the second management node just reports that it itself is connected. Going back to the first management server at 192.168.66.114 still gives the same output as before starting the second ndb_mgmd, i.e. ONLY management node at 192.168.66.114 is connected:
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2 @192.168.66.2 (mysql-5.1.44 ndb-7.1.3, Nodegroup: 0, Master)
id=3 @192.168.66.113 (mysql-5.1.44 ndb-7.1.3, Nodegroup: 0)
[ndb_mgmd(MGM)] 2 node(s)
id=1 @192.168.66.114 (mysql-5.1.44 ndb-7.1.3)
id=6 (not connected, accepting connect from 192.168.66.2)
[mysqld(API)] 2 node(s)
id=4 @192.168.66.2 (mysql-5.1.44 ndb-7.1.3)
id=5 @192.168.66.113 (mysql-5.1.44 ndb-7.1.3)
ndb_mgm>
I've spent many hours now trying to figure out what's wrong, but to no avail. Please also take a look at the ndb_mgmd log file of the first management server, excerpt of which is taken immediately after starting the second ndb_mgmd at 192.168.66.2:
2010-05-21 16:05:04 [MgmtSrvr] INFO -- Reading cluster configuration from 'c:/Progra~1/mysql-cluster-gpl-7.1.3-win32/config.ini'
2010-05-21 16:05:04 [MgmtSrvr] WARNING -- at line 45: Cluster configuration warning:
arbitrator with id 6 and db node with id 2 on same host 192.168.66.2
Running arbitrator on the same host as a database node may
cause complete cluster shutdown in case of host failure.
2010-05-21 16:05:04 [MgmtSrvr] INFO -- Config equal!
2010-05-21 16:05:04 [MgmtSrvr] INFO -- Mgmt server state: nodeid 1 reserved for ip 192.168.66.114, m_reserved_nodes 1.
2010-05-21 16:05:04 [MgmtSrvr] INFO -- Id: 1, Command port: *:1186
2010-05-21 16:05:04 [MgmtSrvr] DEBUG -- 127.0.0.1:3727: Connected!
2010-05-21 16:05:04 [MgmtSrvr] DEBUG -- Sending CONFIG_CHECK_REQ to 1
2010-05-21 16:05:04 [MgmtSrvr] DEBUG -- Got CONFIG_CHECK_REQ from node: 1. Our generation: 1, other generation: 1, our state: 2, other state: 2, our checksum: 0xc7202738, other checksum: 0xc7202738
2010-05-21 16:05:04 [MgmtSrvr] DEBUG -- Send CONFIG_CHECK_CONF to node: 1
2010-05-21 16:05:04 [MgmtSrvr] DEBUG -- Got CONFIG_CHECK_CONF from node: 1
2010-05-21 16:05:04 [MgmtSrvr] DEBUG -- 192.168.66.113:51051: Connected!
2010-05-21 16:05:04 [MgmtSrvr] DEBUG -- 192.168.66.2:65492: Connected!
2010-05-21 16:05:04 [MgmtSrvr] INFO -- Node 1: Node 6 Connected
2010-05-21 16:05:04 [MgmtSrvr] INFO -- Node 6 connected
2010-05-21 16:05:04 [MgmtSrvr] DEBUG -- Sending CONFIG_CHECK_REQ to 6
2010-05-21 16:05:04 [MgmtSrvr] DEBUG -- Got CONFIG_CHECK_CONF from node: 6
2010-05-21 16:05:04 [MgmtSrvr] INFO -- Node 1: Node 3 Connected
2010-05-21 16:05:04 [MgmtSrvr] DEBUG -- 192.168.66.113:51051: Stopped!
2010-05-21 16:05:04 [MgmtSrvr] DEBUG -- 192.168.66.113:51051: Disconnected!
2010-05-21 16:05:04 [MgmtSrvr] INFO -- Node 1: Node 2 Connected
2010-05-21 16:05:04 [MgmtSrvr] DEBUG -- 192.168.66.2:65492: Stopped!
2010-05-21 16:05:04 [MgmtSrvr] DEBUG -- 192.168.66.2:65492: Disconnected!
2010-05-21 16:05:05 [MgmtSrvr] INFO -- Node 3: Prepare arbitrator node 1 [ticket=16800008ebadb656]
2010-05-21 16:05:05 [MgmtSrvr] INFO -- Node 2: Started arbitrator node 1 [ticket=16800008ebadb656]
Personally, I find the following two lines from the above output as interesting:
2010-05-21 16:05:04 [MgmtSrvr] DEBUG -- 192.168.66.2:65492: Stopped!
2010-05-21 16:05:04 [MgmtSrvr] DEBUG -- 192.168.66.2:65492: Disconnected!
There's no error message though, it just says Stopped and Disconnected.
Can anyone figure out what's wrong with my setup? Any help would be much, much appreciated.
http://dev.mysql.com/tech-resources/articles/mysql-cluster-for-two-servers.html
The above link is the step-by-step guide which i used to implement my cluster setup. The article is for a Linux-based setup, but the steps are almost identical for a Windows setup. The command line syntax of course is different for Windows. The only differences which you need to handle are placing the MySQL cluster files in the appropriate place on your Windows box and installing the services manually. Once you've done that, there's no difference - all is done in the configuration files. For starters, you can copy the installation files, then start everything from the command line. This way you'll find it easy to troubleshoot any issues that may arise, as you'll see errors in the command prompt windows. Otherwise you'll need to configure logging first at look at the logs to find out whats happening. Mind you, yo gotta be a magician to get everything working from the first try, so leave logging aside for starters, and look directly at the cluster services' output in your command prompt window. If you manage to get everything working, you should go ahead and install the "daemons" as services under Windows. I can assist you with this if need help. Turns out installing the services can be a very tricky task - though it was a while back I installed my cluster, I remember installing the services took me some time. I did it on a trial and error basis. From memory, you should use ONLY 8.3 names in your paths when installing the services. Otherwise mine failed to start (or install as services, don't remember exactly).
Guys, this one actually fixed itself. Don't know why, but later today the second management node started connecting properly without my intervention.
精彩评论