Fling
Fling, a double entendre. As a transitive verb, it conjures a toss. As a noun, it evokes inhibition, an easy association between acquaintances. Fling 1.0 was introduced in 2008 by Flingo.TV, and by the early 2010s, it was deployed in all Vizio televisions. It provided similar, although not identical, functionality to AirPlay, which arrived in 2010 and superseded Fling. Fling 1.0 allowed a person to cast (or more generally "fling") a video URL to a TV or other media device from a web page or mobile app, upon which the media device would play or queue the video. Fling required no usernames, passwords, short codes, or QR codes. Unlike AirPlay, Fling achieved this feat without requiring access to network multicast or any support from the browser or the operating system. Fling achieved this entirely using JavaScript. The entire system was designed to operate within the constraints of a web or web-like security sandbox. Discovery was accomplished by assuming that the device initiating a fling shared the same public IP address with the device receiving the fling, such as a TV (hereafter referred to as the media device). The JavaScript in the web browser could then use cross-site scripting to open a connection to the media device using the media device's private IP address, thereby confirming that the web browser and the media device were in fact on the same private network.
Clearly, times have changed. AirPlay and Google Chromecast are widely supported. They both use multicast for service discovery and require support from the browser or operating system. Carrier-Grade Network-Address-Translaters (CG-NATs) now often place many households behind a single public IP address. The security sandbox has also evolved to limit the use of cross-site scripting.
Fling 1.0 as initially designed would not work with the modern World Wide Web.
Rather than attempting to replace AirPlay or Chromecast, in the Fling 2.0 project, we provide capabilities that AirPlay and Chromecast do not, and that could, if Apple or Google so desired, be integrated into AirPlay or Chromecast to add more sensible access control and remove friction.
Fling 2.0
Fling 2.0 adds spatial awareness and cross-network flinging to systems like Airplay and Chromecast. It aims to reduce friction in the process of flinging. To the user, press a button in a mobile app, select videos, and watch them play on the TV. To the owner, the TV accepts flings according to policies that align with real-world boundaries such as "within the room," "on the same floor," or "within the same house." This leads us to the tag-line:
Fling that just works...unless it shouldn't.
"Just works...": No usernames, passwords, shortcodes, QR codes, or network configuration. No operating system, hardware, or any other modifications to existing Apple or Android mobile devices.
Human-understandable Access Control.
"... unless it shouldn't": putting your TV on your apartment building's free shared Wi-Fi shoudn't allow your neighbor to fling inappropriate content to your TV. Similarly, the dormmate down the hall should not be able to fling across the dorm's shared Wi-Fi to your TV. These are examples where real-world boundaries establish more natural policy inputs than network boundaries. If you, as the TV owner, were to constrain flings to the same room or to within a number of walls, Fling 2.0 would deny your dormmate's access to your TV.
If we wish to rely on real-world boundaries as inputs to policy decisions, it seems only natural that we should be able to ignore network boundaries. In particular Fling should work without requiring the flinging device to join the same network as the media device and preferably even allowing a device that has only cellular data to fling. After all, if someone is in the same room as the TV, you have a real-world policy enforcement mechanism: if a visitor to your home flings inappropriate content to your TV, kick the visitor out. Assuming most people invited into your home are trustworthy, forcing the visitor to join your home network to enable flings represents unnecessary friction.
How It Works
Imagine this extreme scenario, a war driver flings unwanted content to all fling-enabled TVs as they pass directly in front of each house. If each house were to use WPA2/3 (Wi-Fi Protected Access), and each fling-enabled TV were to require that a flinging device be on the same Wi-Fi network with the TV, then the combination of defenses would block the war driver's access, but if we require the flinging device to be on the same Wi-Fi network then every visitor to your home would have to authenticate themselves to your home network before the TV would permit fling access. This is another scenario that is reasonably solved using spatial awareness. If the TV recognizes when the flinging device is inside the house, the war driver would be blocked and yet visitors could still fling without requiring Wi-Fi authentication.
For the example above, a TV with two antennas placed sufficiently far apart could triangulate the car and from this infer the distance to the car and impose a distance based policy that aligns with the perimeter of the house.
Using Machine Learning with Radical Data Augmentation
Inferring the relative location of a single flinging device from the TV in an environment without walls can be achieved somewhat easily using triangulation. The problem becomes substantially more difficult with the introduction of multiple obstructions such as walls, doorwarys, and furniture, or with the introduction of multiple devices using the same frequency bands.
To handle a wide range of room, furniture, and building layouts, to handle differing building materials, to handle multiple devices communicating in different or the same frequency bands, we employ machine learning. Machine learning requires data. To deal with complex environments requires a lot of data. Setting up an apparatus to measure signal distortion in many buildings with many furniture configurations and other sources of noise is cost and time prohibitive. Instead we aim to use extensive simulation to augment real-world data with synthetic data.
Reducing Costs
In this project we consider cost trade-offs. We assume that the cost of computation necessary for macine learning will continue to decrease, while the cost of antennas has largely plateaud. As such we aim to work with off-the-shelf antennas preferably identical to those commonly used in televisions today, but we can use an FPGA to implement two parallel pipelines: one for standard Wi-Fi and one for spatial awareness. We can later reduce the cost by replacing the FPGA with cheaper ASICs.
But antennas used today are designed to communicate over tight frequency bands. If we wish the TV to locate a nearby mobile device using cellular data, the TV needs to detect signals emitted across a wide frequency range. Potentially from 600 or 700 MHz to over 6 GHz. We need to do this while remaining inside the form factor of today's televisions, without sacrificing Wi-Fi performance, and without appreciably increasing the Bill of Materials (BOM).
Many TVs use more than one antenna. A common configuration is to use 2 antennas to enable 2x2 MIMO potentially with beam-forming to direct the signal toward the Wi-Fi base station. For beam-forming, antennas tend to be placed around half a wave-length apart. For the 2.4 GHz band, this is about 6.25 cm (2.46 inches). For 6 GHz, this is about 3 cm (1.18 inches). Separating the antennas slighly more may enable better triangulation, at the expense of eliminating beam-forming and thus impacting Wi-Fi performance.
If we are willing to increase the cost slightly to increase the accuracy of spatial awareness, we could keep two antennas at a proper distance from each other to perform Wi-Fi communications with beam-forming, and then add a third antenna at the far end of the TV to increase the baseline used for triangulation, and to increase path diversity to better detect devices affected by obstructions.
For a different tradeoff in favor of improving frequency response across a wide frequency range, we could add a third or fourth antenna that is designed for lower or higher frequencies. Of course each of these adds BOM cost and occupies more space within the TV form factor.
Fling Jukebox: Recommendations for Everyone Present
Unlike Spotify, Pandora, Amazon Music, or YouTube, Fling Jukebox selects music based on everyone present by detecting the mobile devices they carry. As people leave and enter its spatial awareness, the Jukebox adapts the music selection accordingly.
Fling Jukebox is just one example of what can be done with spatial awareness. Any equipped TV with a recommender system could adapt based on who is in front of the TV or in the room. Currently, streaming providers like Netflix, Amazon Prime, Disney+, Hulu, HBO Max, Apple TV+, Peacock, and Paramount+ can only identify the account holder or the selected user profile, not everyone present in the room.
Security
Fling 2.0 does not decrypt packets communicated via Wi-Fi or cellular data. Fling 2.0 security revolves around providing adequate access control to fling-enabled media devices.
Fling 2.0 uses features derived from spatial awareness, such as "in the same room," as policy inputs for making access control decisions. This does not mean that Fling makes access control policy decisions based only on spatial awareness.
Sometimes there are specific, recurring problematic flinging devices that may enter the spatial awareness of the fling-enabled media device. A media device owner should be able to block such devices, but this is only possible if there is a consistent way to identify the offending devices. The direction with modern mobile devices has been toward making it harder to identify them. This trend moves at cross-purposes with access control. The traditional solution would be to introduce authentication, e.g., as already exists for all Wi-Fi networks, but in so doing, we add the friction we are trying to avoid. Fling may not be able to deal with a sophisticated attacker that operates in close proximity to a fling-enabled media device, but not all hope is lost for typical, unsophisticated offenders.
Many Wi-Fi devices use MAC address randomization. However, Apple, Android, and many Linux cell phones use a consistent network-specific MAC address once connected to a known Wi-Fi network. Apple calls such MAC addresses Private Wi-Fi Addresses. However, all Wi-Fi standards require that the MAC addresses be transmitted unencrypted. Since media devices are typically stationary devices, it seems reasonable that the set of Wi-Fi networks in proximity to any given media device is relatively stable, and thus any device connected to one of these networks is likely to have a consistent private Wi-Fi address that can be used as an input to an access control policy decision.
Cellular devices are a little harder to identify. Most cellular devices have more than one identifier used by the cellular network to identify a device or a subscriber. For example, most devices have an International Mobile Equipment Identity (IMEI). In modern cellular networks, the IMEI is only sent in the clear when a cellular device first joins the cellular network. Subscribers are typically identified based on temporary identifiers (TMSI or GUTI). Temporary identifiers are generally not useful for making access control decisions. Where features from spatial awareness are insufficient, providing adequate access control for offending cellular devices is an ongoing research question.
Privacy
Adding spatial awareness to a TV brings new privacy concerns. Should a TV know all the mobile devices in its vicinity? Many streaming providers would value this data. We acknowledge the privacy challenges, and attempt to balance them with benefits to users.
Access control based on real-world boundaries can be provided without exposing the data to applications. However, enabling recommendations based on those present requires exposing some data. Rather than applying a blanket policy of disallowing it, why not clearly communicate what data would be shared, with whom, and how it would benefit the user? Then, let the user decide.