Archive for the The Downside of software development Category

HA+ESPHome no good for critical systems

Posted in ESPHome, Home Assistant, OpenSprinklette, RaspingBreathburryDOodlePi, The downside of Opensource, The Downside of software development on May 2, 2020 by asteriondaedalus

Ah, actually, I think I have to go away from using ESPHome and homeassistant combination for OpenSprinklette in any event. 

No longer then interested chasing down a bug I was chasing in the HA REST API and using CURL to turn sprinklers on with a POST to switch services. I was planning to use that to be able to write a Kotlin app for an old Android tablet, since 1) I can’t get HA to open on the old version of Chrome on the tablet and 2) the HA app for Android won’t install on my old Android tablet. The bug I found sits cleanly in the REST API of HA since I am only using the examples and the POST does not work for switches or at least switches implemented using ESPHome.

The problem leading me to drop the bug in the REST API is not associated with the REST API but is because my raspingdoodleburry pi has crashed.  But, I found the ESPHome based sprinkler system has 3 of 4 relays on with the Pi crashed and not online. Even worse, as bringing the Pi and therefore HA back online does not automatically reset ESPHome device. Sure, you could set up something on HA start/restart but that does nothing for the span of time the Pi/HA is down and relays on ESPHome device on. Way too fiddly to sort all likely mishap cases. Especially, if the actual cause of the Pi/HA crash and the breakdown of HA/ESPHome communication causing the relays to come on, is unknown. So, even if the Pi/HA came back up by itself (some time after it crashed), there’d still likely be $K dollar water bills in the mail.

Having had a $3k water bill some time back, due to sprinkler system leak, I gauge it makes no sense keeping this combination of HA and ESPHome for a “mission critical” application, since I am not sure when the p p p p p Pi went d d d d down.

The evidence of a HA/ESPHome interaction problem is the relays in this HA/ESPHome version are set up in automations, to come on and turn off in sequence, with a gap in time between off and on, to avoid having to deal with sharing the water pressure. This works fine with HA up and running. So, having 3 of 4 on will be an artefact of Pi+HA crashing and likely some interaction between HA and ESPHome. That is, there are no automations in HA that would set 3 of 4 relays on at the same time and so no two relays should be on at the same based on the design of the automations. The automations have been running on a test rig and come on sunrise and sunset without a problem and the logs did, till the crash of Pi, reflect that expected behaviour.

That likely means the usual ping ponging between two user groups blaming the “other” software and, frankly, not my bag. I want to water my gardens and not solve a problem with use cases not being tested.

Not a problem, I have previously hand written code to run on the WEMOS UNO format board for the sprinkler system. That code is MQTT driven and was designed from scratch for a sprinkler system with ample safe guards. For example it wont turn on water for less than 5 mins, won’t accept more that a 15 minute request and will turn off a sprinkler after 20 minutes. It will allow a new time setting over top the old so if you want a hour of watering you just ping it with 4×15 minute requests every 15 minutes. I have a node-red setup that also will send an sms if a sprinkler node (either a 4x or 1x wemos setup) raises an LWT. I’ve even designed my own boards to support the 24VAC to WEMOS voltage conversion.

I also have a peg based parser to all setting of sprinkler times, across multiple single and quad sprinklers, via titles of google calendar events. Though, I want to decouple from google and try the same over a local mechanism for both principled and practical reasons.

So, I am also looking at how to hook my parser code into other calendar widgets. I am still to look at HA calendar widgetry, if it exists, since automations are fine but the natural thing is to just want to use even just a weekly calendar and not a full year calendar, or so I imagine I think I want to believe. Although, then node-red calendar widgetry or HA calendar widgetry? It will depend upon ease of integration.

In the end, I actually suspected my hand coded sprinkler option was likely the better approach for exactly what the crashing doodle woodle pi and the interaction between HA and ESPHome has revealed. I would even suspect the ESPHome web site, and the HA one to, have caveats and warnings about not using either for mission critical applications. Especially, if its the emergent properties of the integrations that will get you and no one is really going to arbitrate that except for the end users.

Lucky I found this problem on the test bench.

UPDATE

I poked around a bit with this, the problem is a little more insidious.

I thought the server had crashed as I could not log in with MobaXTerm. In fact, I have my server with Ethernet wire connection to a wifi extender and it turned out that the wifi extender has been dropping connection with the ASDL router. A real pain since I bought a new wifi extender BECAUSE I had the same problem with the older wifi extender AND BECAUSE the “help” on the Net suggested there was problems with the older wifi extender.

So, I found that re-connecting the wifi extender to the ASDL router saw an interesting picture. The ESPHome device must have reconnected to the HA server since the switches on the HA HMI, representing the relays in the ESPHome device, had the same state as the device. 1, 2 and 4 on, 3 off.

Not a problem you say. Well, it is since the way the system is set up is that you turn the switch on at HA HMI, and then an automation runs and turns the switch off after 15 minutes.

On top of that, that behaviour was also required when driving the relays by sunrise/sunset since a delayed trigger to turn the switch on, on sunrise/sunset, was used and the trigger to turn the switch off after 15 mins was expected then to trigger on the switch being turned on. Worked a treat while connection between HA and ESPHome device stayed up.

The problem still remained that three relays on ESPHome device were on (1,2,4) and one was off (3). Three switches on HA HMI were on (1,2,4) and one was off (3). HA log reported all switches off. So, there was a disconnect between states internally to HA somewhere.

The automations for all four switches are all triggered on sunrise/sunset. They have delays that set the individual switches on 20 minutes apart. The second set of automations, triggered by individual switches going on, turns the triggered switch off after 15 mins. So, the system works:

  • sunrise->all four switches triggered with delay x mins.
  • 0 mins->switch 1 on trigger switch 1 off automation
  • 15 mins->switch 1 off by switch 1 off automation
  • 20 mins->switch 2 on trigger switch 2 off automation
  • 35 mins->switch 2 off by switch 2 off automation
  • 40 mins->switch 3 on trigger switch 3 off automation
  • 55 mins->switch 3 off by switch 3 off automation
  • 60 mins->switch 4 on trigger switch 3 off automation
  • 75 mins->switch 4 off by switch 4 off automation

Nothing really then to account for why 3 of 4 relays on, why HMI reflected ESPHome device state and then why if relays are on in ESPHome device the associated switch on in HA did not result in a switch off after 15 mins. That is, I have had the ESPHome device powered up and 3 of 4 relays have been on for two days, where they should have off after 15 mins if triggered on by HA. If I manually switch the switches at HA HMI to off today, they are then off at device. I haven’t rebooted device because I want to see if the states coherence is re-established. I assume that states will be coherent again if the automations run tonight and the relays behave properly again. Noting that that won’t redeem the setup for use in a critical applications, since it would take too much energy to understand the interplay and the problem it is actually a state coherency problem in either or both the design of HA or ESPHome or both.

So, no real way to account for 3 off, for example, since all four switches run with exactly the same automation descriptions. You would expect all four on or all four off, not a mixture.

Ahhhhhhhhhhhhh … no.

Posted in The downside of Opensource, The Downside of software development, thingbox on September 2, 2018 by asteriondaedalus

So. I put Ubuntu Core onto OPi Zero.

I snapped in Mosquitto and Node-Red.

Mosquitto is installed as a service, Node-Red is not.

I started Node-Red, set a graph up to send a timestamp every 5 minutes to debug via MQTT.

Left it all night.

In the morning the server was disconnected.  I did expect this if there was a snap update pushed through.

I did discover you cannot turn off updates … which many people are having concerns about.  Since the system could updated and reboot in the middle of something crucial.

The problem, I was expecting to log back in and simply restart Node-Red by TerraTerm console.

Not a sausage.

The serial port was still responding and putting out the IP address of the board.

I could ping the IP address but I could not log in.

I ended up having to pull the power to force a full reboot.

Not very useful.

Not even sure where to start.  Is this a problem with Ubuntu Core and Snap, a problem with the installed snaps, or are we back to the suspicions about the OPi Zero hardware?

Gag and double frack!

Posted in Sucky Wucky RaspingBreathBurry, The downside of Opensource, The Downside of software development on March 28, 2018 by asteriondaedalus

I have officially given up on the OrangePi Zero as the server for my house.  The system drops off the LAN after a week or two of running, which is no good if its running sprinklers and lights etc.

I regretfully inform I have chosen to get a RaspingDoodleBerryPi to drop TheThingBox onto it.

I don’t otherwise have to admit having one.

I got a black case and I will mount it out of sight on back of a black WiFi router.

It will be between you and me.

At least I can give up on this time wasting debugging.

I did almost steal an Odroid C1 off my cluster.  Almost.

Still, I have two small linux boards now.

I am going to re-burn SD and try to build moos-ivp on one of them.

Why not, worked a treat on my BBB.

 

Self induced sabbatical from fun

Posted in Orange Pi, Sucky service Providers, The downside of Opensource, The Downside of software development on February 20, 2018 by asteriondaedalus

Well, not really.

I had another paper to write to get me to a conference.

In the meantime, the dopey OrangePiZero keeps needing reboot every couple of weeks.  Not sure whether its the node-red, the mqtt server OR the hardware.

Previously, when the website went down, I could login via the serial port – so I suspect the hardware again.

I will likely need to pull an ODROID-C off my cluster to take over the house again.

The orangpizero will at least be a great little brain for robots.

Stuck again

Posted in The downside of Opensource, The Downside of software development on December 18, 2017 by asteriondaedalus

So, I have my iPEGA bluetooth gamepad and I am wanting it to run from my dopey Android phone that I am using as the mediator for the rover.

Nothing useful for using these gamepads in Processing for Android so that appears to be out.

No heart or interest to write something in Java.

Python?  Well pygame for Android seems a dud, a lot of old blog entries but nothing obviously working anymore.

I noted that they sorted some of the flakiness of Kivy and so I re-installed Kivy on my PC with the view of prototyping on Windoze then bombing on Android.

Go figure I can get the pyGame gamepad exerciser working on PC.  I tried the same for the raw Kivy version.   I works!  Well as long as you don’t allow your gamepad to go to sleep.  If you do, Kivy drops it.  And, if you leave it disconnected long enough, the Kivy application closes down quietly.

Posted a question on Kivy.org and an issue on github but no responses at all from anyone.

It is not obvious from the Kivy code why it drops the gamepad.  There is otherwise no hints in any of the examples or any of the documentation.

In the pygame version, the gamepad can go to sleep and when it wakes it is auto-connected again.

So, stuck as I still have to see if I can get the old pygame for android to work with the gamepad.  I would have used Kivy but help is a dead end.

 

So much help, so little help

Posted in The Downside of software development on December 8, 2017 by asteriondaedalus

So, I stupidly thought I would upgrade my elixir and crack open my book on Phoenix and plod through the examples while home sick with the flu.

Book is Programming Phoenix (eBook) P1.0 from PragProg

Broke the build so I had to strip out the hex and mix folders, re-installed hex, elixir and phoenix.

And then all I got was the broken connectic with postgres!  Why?  I was going to up from 9odd to 10 but backed out after I saw how long the 10 would take to download (and it wasn’t needed).  I had already uninstalled 9 and just re-installed.  BUT the build would fail as the username/password failed on authentication.

Hmm.

When you delete postgres on windows it doesn’t, of course, remove the data directory (where usernames and passwords are encrypted).   If you reinstall overtop a bug or quirk of the install is that it not longer prompts for super-user password or port etc.  (obviously buried in the data directory).

All fixed by manually deleting the data directory and re-re-installing postgres.

POSTSCRIPT

GAG!  My book appears to be out of date now!

At least, they have moved all the app directory structures around.

Pragmatic Programmers KEEP UP!

Although, it looks to be the difference between Phoenix versions 1.2 and 1.3 which is fine (just have to remember to use the phoenix.x rather than phx.x idiom for building apps).

POSTPOSTSCRIPT

There are problems with the errata page, as rather than the author drafting errata, as people report the problems, the errata is like a user group and so you get the same drifting away from facts.

Although, problems include movement in version of ecto as well.

If there is an error, go figure there will be at least two solutions offered by readers BUT as they don’t state the elixir, phoenix and ecto versions that they are using (while working through the book) you need likely try each of the different solutions until one works for you.

Looks like best option is to install the elixir, ecto and phoenix versions from the book.

Still, I have had to work things out that don’t appear to have been reported in the errata so I guess I am learning little tit bits along the way.

Soooooooooooo…

Posted in Linux, Orange Pi, Rant, The Downside of software development on June 15, 2017 by asteriondaedalus

…weird!

So, story up to this point was again I added the expansion board to my OPiZ.  That fragged the board somehow so that it had the weird side effect of not booting orangepi.org version of Debian server but happily still booted armbian.ord distro of Debian server.

Weird because orange.pi Debian server is derived (I think) from armbian.org’s distro (at some point).

Weird because orangepi.org distro booted happily before fragging by inserting expansion board.  Weird because even if you took the expansion board out orange.pi distro would not boot from OPiZ any more.

Weider again was that once the expansion board fragged the baseboard, the armbian.org distro could not be set with a static IP via nmtui on the OPiZ.  That was because, after fragging, nmtui would hang the armbian.org distro whenever you tried going to the connections page.

Weird because you could happily set the hostname using nmtui??!!

So, go figure.  I thought to try nmcli.

From examples on internet it should have gone something like (for my setup at least):

root@house:~# nmcli con edit 'Wired connection 1'
nmcli> set ipv4.method manual
nmcli> set ipv4.addresses 192.169.0.100/24
nmcli> set ipv4.gateway 192.168.0.1
nmcli> set ipv4.dns 61.9.226.33 61.9.226.1 8.8.8.8 8.8.8.4
nmcli> save
nmcli> quit
root@house:~#

However, quirk of Armbian is that there is no ipv4.gateway property.  Without it I had set up a static IP but could not get out to internet (no gateway).

I came across a probable back door route with the following;

root@house:~# nano /etc/network/if-up.d/gwconfig
#!/bin/sh

if [ "$IFACE" = "eth0" ]; then
 route add default gw 192.168.0.1
fi
root@house:~# chmod a+x /etc/network/if-up.d/gwconfig

You can then just reboot and then every time you boot “route add default” sets your gateway.

Except in Armbian, of course, that route command does not work with those parameters!

DOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooooooohhhh!

However, while the help file never said “gateway” there was some discussion in the help for nmcli on “addresses”, and in other places, that talked about “hops”.

So, AH HA moment (reaching back to that neuron containing networking 101).

So it turned out the fix was not obvious but is:

root@house:~# nmcli con edit 'Wired connection 1'
nmcli> set ipv4.method manual
nmcli> set ipv4.addresses 192.168.0.100/24 192.168.0.1
nmcli> set ipv4.dns 61.9.226.33 61.9.226.1 8.8.8.8 8.8.8.4
nmcli> save
nmcli> quit
root@house:~#

Notice the gateway is set in address now after the static IP for the OPiZ.

To check this you can use with “route -n”, you should see something like:

route-n

The garbage preceding it is the node-red shutdown down – sweet.

So, now I have static IP setup on my OPiZ – not the happiest route … ha ha … get it?

Now all I have to do is sort the stupid node-red and emqttd problem where, despite quite straight forward steps, neither will start for me as services.  That is the next step as discussed I want the OPiZ to start up with things a blazing.  Although, there might be a backdoor route through startup scripts.

Stereo slam dunk

Posted in Python RULES!, Sensing, The Downside of software development, Vision on June 11, 2017 by asteriondaedalus

With some pain I got the stereo camera that turned up the other day, from aliexpress, to work (provisionally).

stereo_slam1

This is on my windoze PC using 64bit Stackless Python and OpenCV 3.2.

Trick, that stopped me for two days, was working out the problem where one or the other camera would work.  But both together hung.  I would swap order and get same thing.

Turned out to be USB 2.0 choking.  So fix was to work out how to set the image size small enough for the two camera streams to cooperate on the on USB port.

Camera is this one:

stereo_slam2

Which has specs of: 1280*720 MJPEG 30fps 120 degree dual lens usb camera module HD CMOS OV9712. Which is, as it turns out, a lie in this configuration.  The device is USB 2.0 so will choke when trying to pump both through at the same time.  Some work will be needed to sort the maximum resolution that the cameras can be set to – there is likely some black magic math somewhere (or trial and error).

I haven’t used much science in the selection (I waited until prices dropped and grabbed the lowest price one at the time).  I opted for wider field of view because I suspect that creates greater disparity between points to help localisation – however, don’t quote me as that is not back up by any reading at the moment.

The hangup, at the moment, is that while the two cameras are working, OpenCV does rather have various matrix types and so the rotten thing (as usual) “thin”  or sporadic documentation.

If you find “help” any it will be using deprecated functions (from previous versions of OpenCV) or in C++ etc.

Even just a disparity map, that uses the stereo image to show depth planes, needs matrix conversions.

Still, once these are worked out I can buzz out a design on the PC before migrating to an embedded form factor (C.H.I.P., ODROID-C0 or Orange Pi Zero, perhaps even old Android phone).

I am after something to pump a point cloud out.  Using mono-slam is fun but I am not sure that having to get the camera video processing and platform pose working together is happiest medium – since people are helping out with stereo camera like this especially.

Goddamit!

Posted in The Downside of software development on April 24, 2017 by asteriondaedalus

Microsoft are so lame.

Go figure, I bit the bullet and decided to play with MRPT.  The straightforward approach was to run it on my PC so, yes, yet another version of Visual Studio (2013 this time).  It is always with great trepidation I grab yet-another-version of VS as MS seemed not to manage the interaction between licencing between versions.  I have had problems due simply to installing two different versions of the toolset.  The problem was amplified because CMAKE is being used. So, if I mention CMAKE in the help room, when discussing the context in which the error is raised, some dopey MS/VS guru will dump on you for using OPENSOURCE software (CMAKE) even though you’re using the FREE version of VS.

In any event.

Stung again somewhat.

Install taking a stupid long time.

Why.

Its installing all the components I told it not to in the original panels of the wizard.

Why on earth do I want the Windows 8 for phones SDK?

Upgrading Quartus

Posted in The Downside of software development on May 9, 2014 by asteriondaedalus

Well, that was my fault be being lazy.  Since the Quartus downloads take so much time, once I found 13.1 didn’t have support for Cyclone II, I played safe and went for 10.1 sp1.

I am now downloading and installing 11.1 sp2 software to get around problems with Qsim (hopefully) as well as giving me Cyclone II and Max II support (still).

Go figure though, I installed 11.1 University Program (UP) with Qsim over 10.1 okay and Qsim would run, though its wave editor would not.

Now I install 11.1 UP over 11.1 Quartus and naught, nothing, na da, no Qsim installed.

On top of that, the 11.1 install of ModelSim does not recognise the 11.1 install of Quartus, even then I manually point it into the directory, so it won’t install.

Why did I bother?

POST SCRIPT

Where we are at now is I have ripped out all Altera software and reinstalled:

  1. Quartus II sp2 Build 259 into clean directory “11.1” and program group “Altera 11.1”
  2. ModelSim-Starter 11.1 sp1 Build 216 into directory “11.1” created in step above and used the same program group “Altera 11.1”

Having said that, it took quite a while with slow downloads and installing and uninstalling to get that sorted.

We will have to go back to our SOPC example to re-do it in the new Qsys tool, though that isn’t as scary as it sounds.

What we might do next is add a few personalized instructions to our CPU – just to buzz out that process.

This newer development environment includes a “better” way of doing this apparently.