Multilane roads extracted from the OpenStreetMap urban road network using random forests.,DOI:10.1111/tgis.12514.




1. High-precision 3D road information plays an important role in intelligent transportation, urban planning and management. The mobile laser scanning system can quickly obtain the 3D information of the street scene, but it is difficult to directly extract the complete and accurate road boundary from the original point cloud due to the large amount of data, occlusion and complicated urban street scenes. OpenStreetMap is a kind of crowd source geographic data. It can be used to assist road extraction of mobile laser point clouds. This paper proposes a road 3D boundary extraction algorithm that integrates two-dimensional vector data OpenStreetMap and vehicle-borne laser point cloud data. Firstly, the point cloud feature map is constructed by analyzing the spatial distribution characteristics of the Scanning points. The OSM provides the initial position, and then the road boundary extraction is performed on the feature map of the point cloud by the improved active contour model. We use StreetMapper data to carry out experiments. The results show that the proposed algorithm can repair the lack of boundary information caused by point cloud defects, and accurately and completely extract road three-dimensional boundary information, which proves strong robustness and applicability.




The volunteered geographic information (VGI) collected in OpenStreetMap (OSM) has been used in many applica‐ tions. Extracting multilane roads and establishing a high level of expressed detail play important roles in the field of automated cartographic generalization. An accurate and detailed extraction process benefits geographic analysis, urban region division, and road network construction, as well as transportation applications services. The road net‐ works in OSM have a high level of detail and complex structures; however, they also include many duplicate lines, which degrade the efficiency and increase the diffi‐ culty of extracting multilane roads. To resolve these prob‐ lems, this work proposes a machine‐learning‐based approach, in which the road networks are first converted from lines to polygons. Then, various geometric descrip‐ tors, including compactness, width, circularity, area, pe‐ rimeter, complexity, parallelism, shape descriptor, and width‐to‐length ratio, are used to train a random forest (RF) classifier and identify the candidates. Finally, another RF is trained to evaluate the candidates using all the geo‐ metric descriptors and topological features; the outputs of this second trained RF are the predicted multilane roads. An experiment using OSM data from Beijing, China vali‐ dated the proposed method, which achieves a highly ef‐ fective performance when extracting multilane roads from OSM



As information technology has improved, cartography has largely switched from digitization to informatization, and has begun to focus on automatic mapping requirements, including multiscale expressions of spatial data in geographic information science (GIS), series scale‐map production, updating multiscale geospatial databases, and so on. This process is termed “smart cartography” and has been widely researched (Wang, 2010). One hot re‐ search topic is the ability to automatically derive small‐scale road networks from large‐scale road networks, which form the most important feature on many maps. Multiscale road network cartography lies at the core of—and is a key aspect of—many analysis and application studies. As multilane roads play an important role in city road network transportation patterns from fine to coarse‐grained level, their functional hierarchy is crucial (Heinzle & Anders, 2007; Heinzle, Anders, & Sester, 2006; Zhang, 2004)


In recent years, volunteered geographic information (VGI) such as the OpenStreetMap (OSM) project has been widely used for updating spatial databases, in spatial analysis, and in many other applications (Xu, Chen, Xie, & Wu, 2017) because every user can become a contributor (Goodchild, 2007; Li & Qian, 2010). The development of global positioning system (GPS) devices, which can acquire personal geographical location information (Zou, Yu, & Cao, 2017), has conveniently allowed highly detailed OSM road network data to be obtained easily. Multiscale expres‐ sions of road networks and the production of multiscale maps have engendered many new research opportunities. Such studies are helpful in studying the automatic synthesis of road networks and in improving the production of map data. The wiki of OSM has defined a tag of “lanes” to specify how many traffic lanes are on a highway. However, most road layers lack the tag, and OSM road network data have almost no clear indication of multilane road properties; thus, it is of limited use for research on the functional levels of roads. The goal of this study was to extract multilane roads from OSM urban road networks. This study was undertaken for the following reasons:


1. The multilane roads in an urban road network form a framework for the construction of urban road networks. Generally, the multilane roads in urban road networks have high traffic capacity and represent the urban traffic flow model. Thus, analyzing the traffic flow of multilane roads is very important when constructing urban road networks

1. 城市道路网中的多车道道路构成了城市道路网建设的框架。通常,城市道路网中的多车道道路具有较高的通行能力,代表了城市交通流模型。因此,分析多车道公路的交通流在城市道路网建设中具有十分重要的意义。

2. High level of detail (LoD) data concerning urban road networks are required when building road network data‐ bases. The data quality of multilane roads is directly related to the data quality of road networks data at differ‐ ent scales, which affects the effect of multiscale map expression. Therefore, it is important to study the most appropriate way to extract the multilane roads from a road network to establish an application database.

2. 在建立道路网络数据库时,需要有关城市道路网络的高详细程度(LOD)数据。多车道公路的数据质量直接关系到不同尺度上路网数据的数据质量,从而影响多尺度地图表达的效果。因此,研究从道路网络中提取多车道道路的最合适方法,建立应用数据库具有重要意义。

3. The multilane roads of an urban road network play important roles in geographical analysis, traffic analysis, traf‐ fic application services, and so on. The multilane road network also plays an important role in building urban road network models, as well as at the function level.

3. 城市道路网的多车道道路在地理分析、交通分析、交通应用服务等方面发挥着重要作用。多车道公路网在城市道路网模型的建立和功能层面上也发挥着重要作用。

This study extracted multilane roads from the OSM road network using a random forest (RF)‐based method. Most of the multilane roads in a city are expressed by multiple lanes, which can be considered as several closed polygons constructed by their intersecting points. Therefore, multilane roads can be extracted using polygon analysis tech‐ niques (Li, Fan, Luan, Yang, & Liu, 2014), and this study proposes a polygon‐based intelligent extraction method for the multilane roads of urban road networks. The proposed method in this article uses more effective shape descriptors for circularity, complexity, and compactness to describe the multilane polygons. By combining these shape descriptors with the topological characteristics of polygons between roads, some candidate polygons are evaluated by another trained RF. This method is both highly feasible and introduces no loss of precision, making it a significant step in im‐ proving and optimizing road networks.


The remainder of this article is organized as follows. Section 2 provides an overview of prior work related to this study. Section 3 describes the method for extracting multilane roads using the RF in detail. Section 4 de‐ scribes and discusses the experimental results and Section 5 presents concluding remarks.



Road network synthesis is an important research field in cartography, and considerable research has been con‐ ducted on matching, recognizing, and extracting roads (Kuntzsch, Sester, & Brenner, 2016; Volker & Fritsch, 1999; Xiong, 2000). Regarding extraction methods for multilane roads, numerous approaches exist, including manual, semi‐automated, and fully automated. In the early stage, road‐level attributes were used as the extraction metric (Wang, 1994); however, this method is limited by factors such as data quality and data providers’ expertise. Some scholars have proposed the concept of a “stroke,” which is defined as a road that is connected, unbranched, and coherent; subsequently, multilane roads can be selected according to the stroke order (Thomson, 2006; Thomson & Richardson, 1999; Yang, Luan, & Li, 2011). The stroke value can be calculated by multiple attributes such as road length (Chaudhry & Mackaness, 2005), connectivity between strokes (Zhang, 2005), and so on (Jiang & Claramunt, 2004). Indeed, the stroke concept is an effective structural model that allows road network analysis based on the importance of every road path, even without other information (Mackaness, Ruas, & Sarjakoski, 2011). However, the stroke concept does not consider spatial topology; therefore, it can be accurate only at the local level. In recent years, several methods have been proposed for extracting road networks based on their geometric features, topo‐ logical relations, and spatial distribution characteristics (Guo, Qian, Huang, He, & Liu, 2014; He, Qian, Liu, Wang, & Hu, 2015). Among these, some have introduced intelligent algorithms, including a case study approach (Guo et al., 2014), a method that used the genetic algorithm (Wang & Deng, 2005), and another that used a neural network (Balboa & López, 2008; Zhou & Li, 2014). The case study methodology simplifies the complex extraction process but depends highly on an expert case library. The genetic algorithm is time‐consuming and the genetic model can experience convergence problems. Although the intelligent methods used to analyze road networks each have their own advantages and disadvantages, with further research and scientific and technological advances, these methods will become increasingly perfected.


Some studies of multilane road extraction are based on lines—parallel lines, which in proximity are defined as multilane roads when they exhibit the appropriate angles, lengths, and distances—they are connected by grow‐ ing a buffer to generate the road network (Yang et al., 2011; Zhang, 2009). However, because some VGI data are of poor quality, such as the road network data in OSM, it is both time‐consuming and error‐prone to extract multilane roads using only lines (Li et al., 2014). Fortunately, a new approach based on polygon analysis has been proposed (Li et al., 2014), which converts road lines to polygons to better describe the road network. That study used a support vector machine (SVM) to classify multilane roads. Polygon analysis is a better approach for solving the poor‐data problem of VGI data, but it requires capable polygon shape descriptors and an effective method to determine the polygons that represent multilane roads (Li et al., 2014).


In contrast to the abovementioned studies, and by taking full advantage of the polygon analysis method, we aim to extract the multilane roads from OSM data using a machine learning RF‐based approach. In this study, polygon circularity, parallelism, and width are defined, and shape descriptors are extracted using discrete Fourier transforms. Combined with some other geometric features such as compactness, circularity, perimeter, and com‐ plexity, these data form the input to one RF that extracts first‐stage candidates. Then, a second RF evaluates the first‐stage multilane road candidate polygons to generate the final set of multilane roads. The model is trained using the input dataset by adding the proposed topological relationships, including topological intensity and topo‐ logical connections based on the candidates.



A road network is composed of lines. The complex topological relations between the road segments allow the entire road network to be regarded as a group of polygons. The multilane roads in these road networks always contain some parallel lines; therefore, the polygons that describe multilane roads can be recognized by these features (Figure 1). The approach used in this article attempts to find some geometric and topological descriptors for the polygons; then, the RFs are applied to perform a binary classification of the polygons into either multilane roads or not multilane roads.


3.1 | Data preprocessing

OSM is a free worldwide vector map dataset created by volunteers from all over the world; consequently, some volunteers lack professional training, and the OSM dataset includes several problems in terms of both data quality and data availability. First, some road data are repeatedly created by different volunteers; thus, repeated lines may exist in the OSM data which lack professional checking. Second, some of the contributions by non‐professional volunteers may be incorrect (Goodchild & Li, 2012). For example, there are some unreasonable angles between lines, disconnected lines, even entangled lines. It is impossible for all these cases to exist. There is no multilane road attribute in the OSM road network data; therefore, we cannot simply extract the multilane roads in the urban road network based on pre‐existing attributes. Instead, we must analyze the characteristics of the road network data carefully and perform high‐quality processing of the original OSM road network data to obtain processed data that meets the requirements of the method studied in this article.


