Archive for the Parallel Talk Category

I am in so much trouble …

Posted in Linux, Nostalgia, Parallel Talk, The after market on January 18, 2017 by asteriondaedalus

… wife will kill me!

Look what I bid on!

1096078_1.jpg

So, 4 cpu server with 70plus gig of memory.  Maybe dual cores (so eight all up), may be quad cores (so 16 all up).  Won’t know until I grab it (if I win the auction).

I will have to install a hard drive.

Turns out Debian installs just fine, based upon web searches at least.

Why?

Chapel box.

And cheaper than a Parallella.

Should be able to stuff at least one CUDA board in it – just need find a good second hand one.

Go figure, powers up but that’s all I know.

I am so dead.

Mind you, both the wife and the mother in law are actually addicted now to the auction website.

So I feel vindicated … but very, very afraid.

So why do I want to buy the second one they have?

Spares?

Death wish?

Is Charles Bronson channeling through me?

5dw

Parallel versus Concurrent

Posted in Parallel Talk, Sensing on December 26, 2016 by asteriondaedalus

So, I fell upon Chapel, a parallel programming language.  It makes sense for things that Erlang and Elixir are not good for – like image processing.

It isn’t Google’s,  it’s actually Cray’s.

So fun facts:

  1. When working in Canada I came across a coffee mug in a second hand store that was from Cray.
  2. You can Cram a replica of a Cray into an FPGA.

Now I have Erlang up and running on my ODRIOD-C1 home server.  I will get Elixir and Phoenix running as the wife will likely want something less geeky than node-red for her interface to the home automation.

In the meantime, for fun, Chapel is at this very moment compiling on the OD-C1.  Why not, quad core.  Apparently, for non-Cray etc. monsters, you simply need a  UDP module compiled, a lot of fiddling, and you can have two or more nodes (Locales) running.

In any event, the parallel helloworld found me four cores on me OD-C1.

hello-chapel

Black Magic

There was a configuration script to run but other than that I did not have to tell it how many cores, so some smarts buried does the job.  Otherwise the code to run the print on all four cores is a straight forward as:

config const numTasks = here.maxTaskPar;
coforall tid in 0..#numTasks do
   writeln("Hello, world! (from task " + tid + " of " + numTasks + ")");

There is also an image processing tutorial using Chapel.

There are even lecture notes around.

Accelerate 3

Posted in Parallel Talk, Python RULES! on April 21, 2014 by asteriondaedalus

Just ran a couple of examples from a pyOpenCL tutorial site.

I tried running the optimized matrix multiply but it clapped out.  I had to change the “blocksize = 32” to “blocksize=16” before it would work on my GPU.  Otherwise it didn’t offer any impressive numbers.

I then tried the mandlebrot example (having spent some time years ago with mandlebrots in FORTH and on my Amiga).

The grunting GPU snapped the finished mandlebrot in 5.1 seconds.  The serial version took about 190 seconds.  The Numpy version however only took 5.3 seconds???

When running the CL code but selecting the CPU instead of GPU it takes 1.8 to 3.2 seconds???

These examples appear great to show how to get code running via OpenCL and python.  They also quickly start raising questions about algorithm design, data transfer overhead etc.

So, summary.

Running OpenCL on the GPU seems to take as long as running native Numpy.

Running vanilla (non-OpenCL, non-Numpy) on the CPU takes about 3 minutes versus the 5 seconds on the GPU.

Running the OpenCL on the CPU appears to run about twice as fast as on GPU.  This may be any or a combination of 1) Intel OpenCL calling AMD (who knows what sneaky tricks we play to make AMD look bad) 2) array size too small vs data transfers to bring out speed advantage of the GPU 3) algorithm not optimized or not suitable for large speed up through GPU.

Must be some rules/heuristics around for selecting problems that best fit GPU speed up.

 

Accelerate 2

Posted in Development, Parallel Talk, Python RULES! on April 20, 2014 by asteriondaedalus

Okay, so a half day fidgeting to get pyOpenCL working.

Turns out it was easy.

I have a INTEL CORE i5  CPU and you just need to install the Intel OpenCL runtime.  The SDK won’t install anyway unless 1) you have something other than VS Express or 2) there appears to be a cheat to install it anyway.  It isn’t need for pyOpenCL.

Then install the relevant exe for pyOpenCL and run that.  I installed the 32 bit version because I opted for 32bit given the raft of libraries I wanted to use. I did try installing the AMD OpenCL SDK, thinking I would need it for the Radeon, but it turned out that wasn’t it at all. Experimenting with code examples from various places it looks good.

I have copies of a couple of OpenCL books so I can dabble in the background. This is primarily for prototyping and learning the ropes of OpenCL.

I am holding off on my ARM cluster, I was about to jump in and get 3 ODROID U3 but I can wait until they bring out the U4, hopefully with an RK3288.  The ODROID XU has OpenCL (via Octa chip set) but too expensive.

 

Nvidia SUX

Posted in Parallel Talk on April 6, 2014 by asteriondaedalus

Not really, but I do hope they sort out there anti-FreeTrade thinking by not selling things like Shield and TK1 in Australia.

Now if a country can be forced, under free-trade, to export commodities, why are companies on the Internet not policed for failing to sell into countries with free-trade agreements with the source of the technology?

All the politics aside, the TK1 sorta kinda wraps it up for Parallella and for GreenArrays don’t you think?

Neat though is the latest from XMOS (XCORE-EA) which I think still might find it hard to get traction.

XCORE-EA is not targeting the same marked as TK1, Parallella but might go happily against GA144 I suppose.  Because the XCORE-EA is a little more “familiar” to people (and they are avoiding saying “transputer”) they might have a chance.