A duplicate finder based on cryptographic hash functions

Image by haujord

Is this the same picture? Is there a copy of this file in the other folder? Recently I was going through some old files and and had to ask myself these questions quite often. I guess most of you have been in a situation like this. And as you may know it can be time consuming to manually identify all duplicate files. So here we will explore an option to automatically identify duplicate files in Python.

But before we start & to make it short: If you are only interested in the code, not the explanations on how to identify…

My grandmothers cook book meets machine learning part I

Figure 1: My grandmothers old, German cookbook: “Praktisches Kochbuch” by Henriette Davidis

My grandmother was an outstanding cook. So when I recently came across her old cook book I tried to read through some of the recipes, hoping I could recreate some of the dishes I enjoyed as a kid. However this turned out harder than expected since the book was printed around 1911 in a typeface called fraktur. Unfortunately the fraktur typeface deviates from modern typefaces in several instances. For example the letter “A” looks like a “U” in fraktur and every time I see a “Z” in fraktur I read a “3” (see Figure 2).

Image modified from garageband

The world around is a dynamic mixture of signals from various sources. Just like the colors in the above picture blend into one another, giving rise to new shades and tones, everything we perceive is a fusion of simpler components. Most of the time we are not even aware that the world around us is such a chaotic intermix of independent processes. Only in situations where different stimuli, that do not mix well, compete for our attention we realize this mess. A typical example is the scenario at a cocktail party where one is listening to the voice of another…

Downloading minute resolution OHLC data via exchange APIs


Because of the general interest in this matter I created a dataset including all OHLC data from the Bitfinex exchange API and uploaded it as a public dataset on Kaggle.


Algorithmic trading is a popular way to tackle the fast-paced and volatile environment of cryptocurrency markets. However implementing an automated trading strategy is challenging and requires a lot of backtesting, which in turn requires a lot of historical data. While there are several sources available that provide historical cryptocurrency data most of them have drawbacks. Either they are expensive, provide only low temporal resolution data (daily) or cover limited time…

A biologically inspired linear classifier in Python

It has been a long standing task to create machines that can act and reason in a similar fashion as humans do. And while there has been lots of progress in artificial intelligence (AI) and machine learning in recent years some of the groundwork has already been laid out more than 60 years ago. These early concepts drew their inspiration from theoretical principles of how biological neural networks such as the human brain work. In 1943 McCulloch and Pitts published a paper describing the relationships of (artificial) neurons in networks based on their “all-or-none” activity characteristic. This “all-or-none” characteristic refers…

Spike sorting

Epilepsy is a form of brain disorder in which an excess of synchronous electrical brain activity leads to seizures which can range from having no outward symptom at all to jerking movements (tonic-clonic seizure) and loss of awareness (absence seizure). For some epilepsy patients surgical removal of the effected brain tissue can be an effective treatment. But before a surgery can be performed the diseased brain tissue needs to be precisely localized. To find this seizure focus, recording electrodes are inserted into the patients brain with which the neural activity can be monitored in real time. …

Biological neural networks such as the human brain consist of specialized cells called neurons. There are various types of neurons but all of them are based on the same concept. Signaling molecules called neurotransmitters are released at the synapse, the connection point between two neurons. Neurotransmitters alter the membrane potential of the post-synaptic cell by interacting with ion channels within their cellular membrane. If the depolarization of the post-synaptic cell is strong enough an action potential is generated at the axon hillock. The action potential will travel along the axon and trigger the release of neurotransmitters into the synaptic cleft…

Structural MRI scan of the human brain (modified from toubibe)

In the first article of this series we looked at the general organisation of MRI and fMRI datasets. In the second article we moved on and investigated which parts of the brain were active during the fMRI scan by performing a correlation analysis between the data and an idealized response profile. The method worked quite well. We saw activity in the auditory cortex as expected during auditory stimulation but the map looked a bit noisy and we wanted to see if a general linear model (GLM) might give us better results.

What we did so far

But before we move on lets recall how the…

In the previous article we covered the basics about the data structure and the differences between structural and functional MRI (fMRI). In this article we move on to the analysis of the fMRI data to answer the following question: What brain regions were active during the scan?

This is actually the main objective behind doing a fMRI scan in the first place. While high-resolution MRI scans are performed to get static anatomical insights, fMRI scans aim at getting behind the dynamics of brain functions. In fMRI the blood-oxygen-level dependent (BOLD) signal is recorded by the MRI machine which is an…

Structural MRI scan of the human brain (modified from toubibe)

There is a growing interest in applying machine learning techniques on medical data. Brain scans from Magnetic Resonance Imaging experiments (MRI) have been a popular choice with the number of publications combining MRI and machine learning growing exponentially over the last years (see data from PubMed below). Therefore in this first post we will cover some of the basics about structural and functional MRI (fMRI) data to give you an idea of how the data is generally structured. …

Carsten Klein

PhD in neuroscience interested in data analysis and artificial intelligence

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store