Arduino is a development framework, not a chip, nor a circuit board. It can support the development of many types of processor chips, and there are many libraries inside. The software and hardware development methods have obvious building blocks, and the development of applications is simple, convenient and fast.
Arduino is a platform Arduino is just an open source development platform implemented with java and gnu, and its structure is derived from the Processing software development tool made by art lovers. It can support a variety of MCUs, including Atmel’s AtmelTiny series, avr8, ARM Cortex M0, ARM Cortex M3, ST’s ARM Cortex M3 and M4, etc. …
Using Raspberry Pi, microSD card and power supply, a simple desktop can be made. You also need an HDMI cable and a suitable monitor, maybe an old monitor. A USB keyboard and mouse are also required.
The version of Raspberry Pi 3 also has built-in Wi-Fi and Bluetooth. If you use other models, you need a compatible USB dongle.
After everything is set up and the preferred operating system (the latest version of Raspbian) is installed, the desktop computer can be used.
Many estimates indicate that one of the main uses of Raspberry Pi is the Kodi Media Center. Some Kodi versions have been released as disk images. …
In the processing of big data, the different big data frameworks play a key role , through the use of big data system frameworks, the integrated processing of large-scale data is become easy and to get and extract intelligence and other useful reports is also simple. From the perspective of manual statistical analysis and today’s distributed computing platforms are keystone behind the rapid increase in data processing speed and the continuous evolution of the overall architecture. Nowadays, there are many big data frameworks available on the market. The most popular ones are Hadoop, Spark and Storm. …
In 1993, Edgar F. Codd, the founder of relational databases, proposed the concept of online analytical processing (OLAP). Essentially, it is the concept of multidimensional database and multidimensional analysis capabilities. The goal is to meet the specific query and report requirements of decision support or multidimensional environments. After the arrival of the Internet era, the surge in data volume has also brought new challenges to relational databases. The most obvious challenges are as follows:
Expansion cost of data column is huge
Because the relational database defines the fields of the Table in advance, when the database already has hundreds of millions of data, the business scenario needs a new column of data. You are surprised to find that under the rule of the relational database, It is necessary to operate these hundreds of millions of data at the same time to complete the addition of a new column (otherwise the database will report errors), which poses a great challenge to server performance in the production environment. …
In the field of data processing, we are generally divided into online transaction processing (OLTP, Online Transaction Process) and online analysis processing (OLAP, Online Analysis Process). Take shopping as an example online transaction processing is to ensure that the same product is not purchased by multiple people. Online analysis and processing is to count how many people have purchased this product.
Kylin is a big data analysis engine built on the Hadoop platform. On the PB data (1PB=1000TB) data set, a tool that can return summarized data in seconds. Let me give you an example of the ability to summarize data, for example, I want to know the total score of each person in my game. This is doing data aggregation. This ability is amazing. …
Looking back on the 10 years of evolution of distributed computing systems, we can more easily recognize the relative positions of Spark and Ray. In 2004, Google proposed MapReduce as a cluster programming framework, and cooperated with Google File System and other technologies as the support of the underlying storage. After more than 10 years, MapReduce became popular.
The reason for its success is that it provides programmers and data scientists with a very good understanding, rich expressiveness, high fault tolerance, and it is easy to implement a distributed system architecture based on commercial hardware (commodity devices).
Then in 2010, with the concept of memory cloud proposed by Stanford, researchers realized that memory, which seemed to be very expensive, was becoming cheap, and many fault-tolerant operations that were highly dependent on disk could actually be implemented in memory. In this context, Spark came into being, giving birth to RDD and a series of memory-based optimization technologies, replacing the original disk-based frameworks such as Hadoop Hive in small and medium-scale computing. But so far, Hive has not been completely replaced by this. In the very large-scale computing (PB level) scenario, it relies on SSD and super robustness, which is still the first choice of many companies. …
The Dataflow model aims to establish an accurate and reliable solution for stream processing. Before the Dataflow model was proposed, stream processing was often regarded as an unreliable but a low-latency processing method. It required an accurate but high-latency batch processing framework similar like MapReduce to get a reliable result. This is the famous Lambda architecture .
This architecture brings a lot of trouble to the application. For example, the introduction of multiple sets of components leads to increased system complexity and difficulty in maintainability. …
As we all know, Hadoop, the first-generation framework for big data computing, was created to solve the problem of offline computing. Apache Spark has excellent results in offline batch processing, but many drawback for real-time stream processing. After Hadoop, Spark and Storm became rivals in stream processing.
The emergence of the Spark framework is inherited and developed on the basis of Hadoop MapReduce. In essence, it still adopts the idea of batch processing. However, the intermediate process of data calculation has been optimized, thereby improving the efficiency of data processing and gaining more Native MapReduce has better computing performance.
Spark provides Streaming with the help of core Spark API. Its stream processing idea is to divide it into batch processing jobs in advance according to time intervals before processing. …
The following describes each technology in different layers. Of course, each layer is not strictly divided in the literal sense. For example, Hive provides both data processing functions and data storage functions, but it is classified as a data analysis layer here.
1. Data acquisition and transmission layer
The Apache Spark core is the basic execution engine of the Spark platform. All other functions are built on this engine. It not only provides memory computing functions to improve speed, but also provides a general execution model to support various applications. In addition, users can use Java, Scala and Python API to develop applications. Spark core is built on a unified abstract RDD, which allows various components of Spark to be integrated at will, and different components can be used in the same application to complete complex big data processing tasks.
What is RDD
RDD (Resilient Distributed Datasets) was originally designed to solve the problem that some existing computing frameworks are not efficient in processing two types of application scenarios, which are iterative algorithms and interactive data mining . Both application scenarios, by storing data in memory, performance can be improved to several orders of magnitude. For iterative algorithms , such as PageRank, K-means clustering, logistic regression, etc., intermediate results often need to be reused. Another application scenario is interactive data mining , such as running multiple ad hoc queries on the same data set. In computing frameworks just as Hadoop, theb intermediate calculation results is to save them to an external storage device (such as HDFS), which will increase additional data replication, disk IO, and serialization efforts. This will increase the work load of the application. …