nocand abort

Started by jimh

jimh

Hi Alain,

While experimenting the nocan setup I managed to write some bad code that ultimately caused nocand to abort. The issue here is that with the bad code on a canzero cand will repeatedly abort and then you cannot load a new version as the upload required nocand to be running.

I managed to recover the two failing nodes by setting up a new string of 2 canzeros and ensured I has a led2 channel visible. Swapped what was the led1 node with 1 on the failing nodes. Ensured everting was running then updated the code on that node.

I have provided only snippets of the log and code but can provide the full working, non working and logfile if required.

This raises a couple of things that probably need to be address:

  1. once nocand fails there should be a method to force load a canzero node to recover. Perhaps something like:

    nocand upload -force …..

  2. It looks from the trace that nocand aborts if it cannot find a channel i.e. the channel does not already exist. This really should be a warning and not an abort situation.

2a. This raises a usage question: does a node always have to register a channel? The code was attempting to check if a chanel exists (i.e. lookup) and only register if it does not.

Regards,

Jim

a snippet of the code that I think causes the problem:

// the setup function runs once when you press reset or power the board
void setup() {

// set the digital pin as output:
pinMode(ledPin, OUTPUT);

// Initialise Nocan
for (;;)
{
	if (Nocan.open() >= 0)
		break;  // Nocan open success
	delay(1000);
}

char* ledName = "led2";

int8_t lookUpStatus;

// lookUpStatus = Nocan.lookupChannel(const char *channel, NocanChannelId *channel_id);
lookUpStatus = Nocan.lookupChannel(ledName, &lid);
if (lookUpStatus < 0) { // lookup a failed, channel not registered
	Nocan.registerChannel(ledName, &lid);
}

Nocan.registerChannel("temperature", &tid);
Nocan.registerChannel("humidity", &hid);

Nocan.subscribeChannel(lid);


// Initialize sensor device.
dht.begin();
sensor_t sensor;
sensoreName = sensor.name;

// call the read_sensors function every 60 seconds
timer.every(60000, read_sensors);

I trapped the output as shown below:

2018/12/24 15:28:56 [90mDEBUG++[0m SEND FRAME EXT@90140201 8: 0d 0a 14 43 4d 48 59 2f>
2018/12/24 15:28:56 [36mDEBUG+[0m (20) SPI SEND 15: 090000000000000000000000000000 (SPI_OP_FETCH_DATA)
2018/12/24 15:28:56 [36mDEBUG+[0m (20) SPI RECV 15: ff0d90340301000d0a14434d48592f
2018/12/24 15:28:56 [36mDEBUG+[0m (21) SPI SEND 2: 0a00 (SPI_OP_RECV_ACK)
2018/12/24 15:28:56 [36mDEBUG+[0m (21) SPI RECV 2: 0080
2018/12/24 15:28:56 [90mDEBUG++[0m RECV FRAME EXT@90340301 0:>
2018/12/24 15:28:56 [96mDEBUG[0m ** Received <nocan-sys-address-configure-ack node=1, func=3, param=1, len=0, data=> **
2018/12/24 15:28:56 [36mDEBUG+[0m (22) SPI SEND 15: 090000000000000000000000000000 (SPI_OP_FETCH_DATA)
2018/12/24 15:28:56 [36mDEBUG+[0m (22) SPI RECV 15: ff0d90341000046c65643200000000
2018/12/24 15:28:56 [36mDEBUG+[0m (23) SPI SEND 2: 0a00 (SPI_OP_RECV_ACK)
2018/12/24 15:28:56 [36mDEBUG+[0m (23) SPI RECV 2: 0080
2018/12/24 15:28:56 [90mDEBUG++[0m RECV FRAME EXT@90341000 4: 6c 65 64 32>
2018/12/24 15:28:56 [96mDEBUG[0m ** Received <nocan-sys-channel-lookup node=1, func=16, param=0, len=4, data=6c656432> **
2018/12/24 15:28:56 [93mWARNING[0m NOCAN_SYS_CHANNEL_LOOKUP: Node 1 failed to find id for channel led2
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1e7c18]

goroutine 25 [running]:
github.com/omzlo/nocand/controllers.(NocanNetworkController).handleBusNodeMessage(0x106c6000, 0x1066a660, 0x107166e0)
/home/pi/go/src/github.com/omzlo/nocand/controllers/network_controller.go:284 +0xbec
github.com/omzlo/nocand/controllers.(
NocanNetworkController).handleBusNode(0x106c6000, 0x1066a660)
/home/pi/go/src/github.com/omzlo/nocand/controllers/network_controller.go:208 +0xf4
created by github.com/omzlo/nocand/controllers.(*NocanNetworkController).handleMasterNode
/home/pi/go/src/github.com/omzlo/nocand/controllers/network_controller.go:186 +0x434

Admin

Hi Jim,
Thanks for the bug report. Nocand should normally never crash like that. We will take a look at this and issue a bug fix, probably after the holidays.
Happy Christmas,
Alain – Omzlo

Alain

A new version of nocanc (0.1.13) has been released and fixes this issue.

To answer your questions:

  • A node does not need to register a channel to subscribe to it. Your code was perfectly valid and is in fact the correct way to proceed.
  • Registering a channel that already exists does not create any issue.

Best regards,
Alain – Omzlo

jimh

Hi Alain,

Interesting the patch is to nocanc not nocand. I test this out over the next few days.

Best Regards,
Jim

Alain

Sorry, my mistake, both nocanc and nocand have been patched. The key patch applies to nocand of course.
Best regards,
Alain – Omzlo

jimh

Hi Alain,

I have been testing the latest bits over the past 4-5 days and all looks with no further problems found.

Thanks.

Best regards,
Jim