-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
433 lines (422 loc) · 21.2 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
<!DOCTYPE html>
<head>
<title>dp50mm</title>
<link href='https://fonts.googleapis.com/css?family=Dosis:400,200,600,700' rel='stylesheet' type='text/css'>
<link href="../resources/material.min.css" rel="stylesheet" type='text/css'>
<style>
body {
font-family: 'Dosis', sans-serif;
margin:0px;
padding:0px;
}
h1, h2, h3 {
font-family: 'Dosis', sans-serif;
}
.page-content {
padding:40px;
padding-left:5px;
}
.blog-post {
max-width:500px;
margin-left:10px;
}
.section {
padding:20px;
background:white;
border-radius:3px;
margin-bottom:10px;
}
.navigation-text {
margin-left:40px;
}
.mdl-layout-title {
font-family: 'Dosis', sans-serif;
font-size:40px;
margin-top:20px;
font-weight:600;
}
.mdl-layout__drawer {
background-image: url('../images/timespherecomponent.svg');
background-size:650px 650px;
background-position:-70px -250px;
background-repeat:no-repeat;
position:fixed;
}
h4 {
margin:0px;
}
.clickable-image {
border-radius:10px;
transition: border-radius 0.1s;
}
.clickable-image:hover {
border-radius:20px;
opacity:0.9;
}
.inverse {
background-color: rgb(50,50,50);
color:white;
}
</style>
</head>
<body>
<!-- No header, and the drawer stays open on larger screens (fixed drawer). -->
<div class="mdl-layout mdl-js-layout mdl-layout--fixed-drawer">
<div class="mdl-layout__drawer">
<span class="mdl-layout-title"><a href='http://www.github.com/dp50mm/'><img src='../images/github.jpg' width=40px>dp50mm</a></span>
<br><br><br><br><br>
<p class='navigation-text'><a href="http://www.erwinhoogerwoord.nl">erwinhoogerwoord.nl</a></p>
<nav class="mdl-navigation">
</nav>
</div>
<main class="mdl-layout__content">
<div class="page-content mdl-color--grey-200">
<div class="blog-post">
<h1 id="AR">Phenotype Augmented Reality iOS app</h1>
<p style="color:red;">
[work-in-progress]
</p>
<h2>Overview</h2>
<ol>
<li>
Augmented reality
</li>
<li>
Smartphone platform
</li>
<li>
Interaction
</li>
<li>
Existing technology
</li>
<li>
Aruco markers
</li>
<li>
Implementation on iOS
</li>
<li>
Camera calibration
</li>
<li>
Marker detection & orientation
</li>
<li>
Image management
</li>
<li>
Neural network training
</li>
<li>
Overlays
</li>
</ol>
<h2>Tutorial</h2>
<p>
This is a step-by-step tutorial for implementing Aruco marker detection
and Neural Network phenotype detection in iOS with information overlays for <em>augmenting reality</em>.
It is written as documentation for <a href="http://www.erwinhoogerwoord.nl">my</a> graduation project.
The goal of the project is to enable new ways of making observations and conclusions in experiments with growing plants and make them more easily shareable for DIY experimenters and plant growers by augmenting reality with digitally stored and analysed images.
The goal in writing this tutorial is to contain all relevant conceptual and technical documentation.
</p>
<h3>Observations</h3>
<p>
The use of <em>visual communication</em> in the form of illustrations goes back a long way in biology.
</p>
<img src="images/Louis_Renard_colorful_fish.jpg" width="100%" />
<p class="img-source">
<a href="https://en.wikipedia.org/wiki/Biological_illustration#/media/File:Louis_Renard_colorful_fish.jpg">Illustration from the book Histoire naturelle by Louis Renard, published in Amsterdam in 1754.</a>
</p>
<p>
Making drawings of the phenomenon that is the subject of study is still regular practice in high-school.
</p>
<h3>Digital imaging</h3>
<p>
In the past this required a lot of sketching skills. These days images can quickly be captured through digital cameras everyone has on them everyday.
This enables new modes of communication in the practice of biological research.
</p>
<h3>Visual storytelling</h3>
<p>
By compiling a series of images an order in events can be implied and highlighted, experience can be recreated or synthesized.
</p>
<img src="images/FoodStoriesillustrations-10.png" width="100%" />
<h3>Interactive visuals</h3>
<p>
Another step further is to create interactive graphics that let the observer navigate an experience motivated by their own intentionality.
</p>
<div id="franks-interactive-drawing">
</div>
<h3>Possibilities</h3>
<p>
There are opportunities in both the capture and navigation of images.
The application to be developed in this tutorial allows you to more easily <em>compile series and collections of images</em> along dimensions such as location, orientation and time.
The application also enables you to view previously captured observations within the context of the current reality being captured in the live-camera.
A next step is to increase the number of dimensions for ordering images by generating algorithms (neural networks) to detect phenomena in the image.
</p>
<p>
The technique to capture the location, orientation and time of images is through the use of <a href="http://www.uco.es/investiga/grupos/ava/node/26">Aruco markers</a> (a type of <a href="https://en.wikipedia.org/wiki/Fiducial_marker">fiducial marker</a>).
Aruco markers can be detected in an image by id as well as pose (location + orientation).
In a garden setup the marker should remain stationary with the plant. Then the camera position can be deducted from the marker orientation, relative to previous images containing the same marker.
</p>
<h3>Functionalities</h3>
<ul>
<li>
Re-orient images to match perspectives based on aruco markers.
</li>
<li>
Train neural networks to detect various biological phenomena.
</li>
</ul>
<p>
The Neural Networks are used for detecting various phenomena in the image.
The goal is to create an application where images can be re-oriented to match previous perspectives on reality as well as real-time video feeds.
In this tutorial we go over the various steps of image processing necessary to produce the functionalities described above.
</p>
<h2 id="augmented-reality">Augmented reality</h2>
<img src="/images/AR_mockup.png" width="100%" />
<p>
Augmented reality is a live direct or indirect view of the physical real world
being augmented by a computer for example.
</p>
<h2 id="smartphone-platform">Smartphone platform</h2>
<p>
What is commonly known as the smartphone is a computing platform ready to use for augmented reality development.
<ul>
<li>
It provides a camera to digitally capture a perspective on reality.
</li>
<li>
A computer to analyse and augment data.
</li>
<li>
A digital touch-screen to display for display and input of data.
</li>
</ul>
</p>
<p>
Combining these things together, a livestream of the digital camera can be displayed and interacted with on the screen with computed augmentations.
</p>
<h2>System architecture</h2>
<p>
In the following steps is described how the functionalities as specified above can be built on the iOS smartphone platform.
Each of the described components requires code to enable the functionality.
First an interaction model is described for analysing views.
Secondly the iOS camera architecture is covered.
</p>
<h2>Interaction</h2>
<p>
Image detection and analysis produces numerous augmented views.
In development and interaction it's useful to continuously have quick access to all of them.
With a horizontal scroll-view in iOS each of the views can be put next to each other.
</p>
<img src="/images/Scrollviewillustration-09.png" width="100%" />
<h3>Views</h3>
<p>
The multiple views provide augmented perspectives on the live camera image-feed.
</p>
<h3>Image capture</h3>
<p>
With a button an image can be captured for storage.
</p>
<h2>iOS / iPhone camera</h2>
<p>
The iPhone has a powerful camera to capture live video streams for augmented reality applications.
It is controlable through the <a href="https://developer.apple.com/av-foundation/">AV Foundation framework</a> in the iOS operating system.
First the user-facing functionalities are covered. These are the real-time camera stream and image capture.
Both functionalities require precise control over the camera data flow through the AVFoundation framework, API and multi-threading of processes so that during the capture process the UI always remains responsive.
</p>
<h3>Real-time stream</h3>
<p>
The core of AR is the display and analysis of a live camera stream with overlays.
Camera sensor data needs to be captured, processed, and displayed in real-time.
</p>
<h3>Image capture</h3>
<p>
In this application images can be captured to be integrated in the AR layers.
</p>
<h3>AVFoundation structure</h3>
<p>
The AVFoundation framework has defined multiple objects to control the hardware and data flows.
The following schematic provides an overview of the data flow between the various objects.
</p>
<img src="/images/schematics/AVFoundation_structure-11.png" width="100%" />
<ul>
<li>
<a href="https://developer.apple.com/reference/avfoundation/avcapturesession">AVCaptureSession</a> is the core object where everything comes together.
</li>
<li>
<a href="https://developer.apple.com/reference/avfoundation/avcapturedevice">AVCaptureDevice</a> is the object representing the physical camera in the iOS device.
It provides the input data (both audio and video) for an AVCaptureSession object.
</li>
<li>
<a href="https://developer.apple.com/reference/avfoundation/avcaptureconnection">AVCaptureConnection</a> connects the input to an output through the AVCaptureSession.
</li>
<li>
<a href="https://developer.apple.com/reference/avfoundation/avcapturephotooutput">AVCapturePhotoOutput</a> provides the software interface for capture workflows related to still photography.
</li>
<li>
<a href="https://developer.apple.com/reference/avfoundation/avcapturephotooutput">AVCapturePhotoOutput</a> provides the software interface for capture workflows related to still photography.
</li>
</ul>
<p>
This defines the overall structure of the AVFoundation framework.
To implement the specified functionalities of <em>real-time augmented streaming</em> and <em>image capture</em> two specific implementations need to be programmed in such a way that they do not interfere with each other in camera usage and processing capacity.
</p>
<h3>Real-time augmented streaming implementation</h3>
<p>
</p>
<h3>Capturing & storing photos</h3>
<p>
To access the camera on various iOS devices an
<a href="https://developer.apple.com/reference/avfoundation/avcapturedevice">AV capture device</a> object is created in the code.
</p>
<p>
When you want to capture a photo you call the function <a href="https://developer.apple.com/reference/avfoundation/avcapturephotooutput/1648765-capturephoto">capturePhoto()</a> and pass it two objects.
</p>
<ul>
<li>
Photo Settings.
<br />
<a href="https://developer.apple.com/reference/avfoundation/avcapturephotosettings">AVCapturePhotoSettings</a>
</li>
<li>
The photo capture delegate.
<br />
<a href="https://developer.apple.com/reference/avfoundation/avcapturephotocapturedelegate">AVCapturePhotoCaptureDelegate</a>
</li>
</ul>
<h2 id="libraries">Image analysis</h2>
<p>
This application of computer vision has been built and implemented in various open and closed libraries.
To detect these the openCV library is used with the Aruco module.
<ul>
<li>
<a href="http://opencv.org/">OpenCV</a>
</li>
<li>
<a href="http://www.uco.es/investiga/grupos/ava/node/26">Aruco</a>
</li>
</ul>
</p>
<h2>Aruco markers</h2>
<p>
<a href="https://github.com/opencv/opencv_contrib/blob/master/modules/aruco/tutorials/aruco_detection/aruco_detection.markdown">Aruco tutorial</a>
</p>
<h3>Pose estimation</h3>
<p>
We want to know the rotation and translation for each detected marker.
The function estimatePoseSingleMarkers gives us these.
It takes the following attributes: corners, markerSize, cameraMatrix, distCoeffs, rvecs, tvecs.
The rvecs & tvecs are arrays of vectors.
</p>
<ul>
<li>
rvec: Rotation vector
</li>
<li>
tvec: Translation vector
</li>
</ul>
<p>
The detectMarkers function provides the initial array of markers together with the ids.
So to create the final output of markers we go through the following steps:
</p>
<ol>
<li>
get image
</li>
<li>
get camera parameters
</li>
<li>
get the camera parameters
</li>
<li>
detect markers
</li>
<li>
loop through markers for pose detection
</li>
<li>
Return poses to main application
</li>
</ol>
<h2>iOS Application architecture</h2>
<p>
</p>
<p>
OpenCV as a C library can be used on iOS through Objective-C.
The current standard for programming in iOS is the Swift programming language.
To use the capabilities of OpenCV in Swift wrapper code needs to be written.
On the one hand functions need to be called from Swift code and data needs to be returned from Objective C to the Swift function calls.
</p>
<h3>Calling Objective-C functions from swift</h3>
<h3>Returning data to Swift functions</h3>
<p>
<a href="https://developer.apple.com/library/content/documentation/Swift/Conceptual/BuildingCocoaApps/InteractingWithObjective-CAPIs.html#//apple_ref/doc/uid/TP40014216-CH4-ID35">Interacting With Objective-C Apis</a>
<br />
</p>
<h2 id="Camera-calibration">Camera calibration</h2>
<p>
The lens system of a camera distorts the captured image.
The pose estimation function of the Aruco module requires parameters to correct for that distortion.
These are a <em>cameraMatrix</em> object and <em>distCoeffs</em> object as defined by the <a href="http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html">OpenCV calib3d</a> camera calibration module.
In this part of the tutorial we go over how these are determined for a camera.
</p>
<h3>Distortion</h3>
<p>
The lens distortion can be categorized in two mayor transformations:
</p>
<ul>
<li>
radial transformation
</li>
<li>
tangential transformation
</li>
</ul>
<p>
OpenCV has tools to correct for this distortion in the form of a Camera Matrix object,
Algorithms to determine the distortion parameters of the lens as well as corrections for
applications.
</p>
</p>
<h3>Camera Matrix</h3>
<p>
The camera matrix contains the following parameters in a matrix format:
fx (focal length in pixel units)
fy (focal length in pixel units)
cx (center x)
cy (center y)
<br />
These parameters scale with the image resolution.
This is stored in a 3x3 matrix.
</p>
<h3>Distortion co-efficients</h3>
<p>
These are stored in a 5x1 matrix.
</p>
<div id="container">
</div>
<h2>Augmentation layers</h2>
<p>
You can do a couple of augmentations to the data stream.
The input data stream can be filtered. A simple example is cropping the photo.
The input data stream can be transformed. For example the photo can be made black & white.
The input data stream can be analysed. For example markers can be detected.
The output data stream can be adapted in various ways as well.
Extra information can be added. For example a timer.
In this application previously captured photos (i.e. observations) of the same marker should be recalled and displayed
within the context of a currently visible marker in the realtime camera stream.
</p>
<h3>3D Graphics</h3>
<p>
To work with and display 3d graphics on a high level, Apple has created the SceneKit library.
</p>
</div>
</div>
</main>
</div>
</body>
</html>