Feature #630

Work towards supporting data sets containing more records

Added by Andy Dufilie over 5 years ago. Updated about 5 years ago.

Status:ResolvedStart date:11/07/2011
Priority:ImmediateDue date:
Assignee:Andy Dufilie% Done:

100%

Category:Internal Code Refactoring
Target version:1.0
Complexity:High OIC Priority:
Required by:Grand Rapids, Michigan

Subtasks

Bug #732: map zoom gets messed up if actively probing while the shapefile is parsed, and the map doesn't draw until probed.ResolvedAndy Dufilie

Bug #739: bar chart doesn't zoom properly until something is probed.ResolvedAndy Dufilie


Related issues

Related to Weave - Bug #779: Michigan Blocks Loads in about 45 seconds when there is data binned to the shape file Open 01/17/2012
Related to Weave - Other #876: refactor the rendering code to be asynchronous Resolved 03/15/2012
Related to Weave - Other #877: rewrite Bounds2D.getGridContainment() to allow more efficient comparisons Resolved 03/15/2012
Blocked by Weave - Other #604: Run the profiler and find out which functions are slow. Resolved

History

#1 Updated by Andy Dufilie over 5 years ago

  • Category set to Internal Code Refactoring
  • Assignee set to Andy Dufilie
  • Target version set to 1.0
  • Complexity set to High

#2 Updated by Andy Dufilie over 5 years ago

  • Status changed from Open to In Progress

#3 Updated by Andy Dufilie over 5 years ago

Here's what the code is doing now.

There are two sets of tiles, geometry tiles and metadata tiles. The geometry detail is not downloaded until you change the zoom so you would be able to see the shapes. When you first load a geometry column, it makes requests for the complete list of "metadata tiles" which contain keys and bounding boxes. This will take a long time if there are a lot of records. This is easily changed on this line of code to make it only download the metadata tiles that would actually be renderered at the current zoom level.

One problem is that if we don't download all the metadata tiles and you want to query your custom radius or polygon, it won't get the keys that it hasn't downloaded the metadata for yet, since the spatial query is computed on the client side. Also, when you make a selection rectangle on the map and you haven't downloaded the keys yet, you won't get the same selection result compared to when you have all the keys and bounding boxes downloaded.

Another problem is that we don't cancel any download requests yet. There's no logic in the client that determines whether or not it still needs a pending download. That logic can be added, along with logic that prioritizes the parsing of the already-downloaded data based on the current zoom. Another thing we can do is store AMF3-encoded objects that wouldn't have to be parsed as much by the client (AMF=ActionScript message format).

When you import a shapefile in Admin Console, it creates a table containing 3-dimensional bounding box information (x,y,z) for each tile and and a binary blob of data for the tile. The 3-D bounding boxes for the tiles are sent to the client so the client knows which tiles it needs to request. Based on the current zoom level of the map, individual tiles are requested as needed. The geometry data (coordinates for the polygon vertices) are not readily available to be queried on the server. Right now, in order to compute a polygon intersection, the binary blobs must be downloaded and parsed on the client side. If you want to do spatial queries on the server side, we may be better off using a WFS server, since all that functionality is already implemented. If we start using server-side spatial queries, it would require a refactoring on the client probing/selection code because it then becomes an asynchronous query instead of an instant client-side KD-tree query.

I designed this geometry tile system early on in the project (about 2.5 years ago) and it works well for large shape files containing a small number of very detailed polygons (for example, a 40-mb US States shapefile displays very quickly). It does not use specific GIS functionality available in PostGIS or other databases because we don't want to force users to use a specific database. I don't know all the details about it now, but when I tested the spatial querying features of PostGIS/MySQL, they were too slow (and MySQL only supports bounding boxes).

The problem we are having is related to a large number of records. This problem is not just limited to shape files -- the attribute columns in Weave do not support a large number of records either. Any improvement we make to the geometry support would not improve the performance of normal string/number column data.

#4 Updated by Andy Dufilie over 5 years ago

I've added an option to request only the bounding box information visible at the current zoom level (now enabled by default, changeable through the global settings panel under "Advanced").

Demos:
Boston 100,000 Parcels
Michigan 300,000 Blocks

Selection and probing on the map will not catch the shapes that are too small to be seen unless the bounding box info has been downloaded. To see this occur, follow these steps:
1. Open the Boston demo and wait for the shapes to finish downloading.
2. Draw a small selection rectangle inside Boston.
3. Draw a small zoom box inside Boston where you made the selection.
The smaller shapes will download when you zoom in and you will see that the small ones are not selected because their bounding boxes weren't there at the time you made the selection.

These demos still have the problems I've mentioned before about not cancelling downloads or prioritizing the parsing of the shapes.

#5 Updated by Andy Dufilie over 5 years ago

  • Required by set to Grand Rapids, Michigan

#6 Updated by Andy Dufilie over 5 years ago

I've just updated the nightly build to include an option for changing the minimum importance value for rendering geometries. It's in the last tab of the Window->Preferences panel.

#7 Updated by Andy Dufilie over 5 years ago

We are currently making incremental changes based on profiling and our understanding of various inefficiencies to improve the performance of the code, but it is unknown whether or not Weave will be able to fully support 300,000 records in the current rendering system by the requested deadline (end of 2011).

One option is to start using WMS and WFS for displaying and quering the shapes. In that case, we would have to create an adapter for the particular WMS server you would use, and we could write asynchronous wrapper functions for querying the WFS service.

Chris, what do you think of a WMS/WFS solution? Do you already have data on a server that supports those protocols?

#8 Updated by Chris Stefanich over 5 years ago

Currently we only have a WMS server setup using Mapnik 0.7.1 and ogcserver (more info can be found here: ttps://github.com/mapnik/OGCServer). We do not have a WFS server setup nor have we ever set one up so it would be a new learning curve/experience for us to do that.

We do have geography data (not indicator data) loaded into postgis so it could easily feed our wms to make the raw geography tiles (like blocks) but not fill it in with the indicator data. I think we would need to know more about the proposed solution and whether it would behave in a similar manner as the rest of weave does.

#9 Updated by Andy Dufilie over 5 years ago

In that case I think we will just continue as we are going. It is looking promising.

#10 Updated by david percy over 5 years ago

Chris, how are you adding a custom WMS? I don't see any options for that, and we're using a 2 week old update...
Thanks,
Percy

#11 Updated by Kyle Monico over 5 years ago

Weave doesn't support custom WMS currently. Many WMS providers use their own formats for requesting and encoding tiles. If you want it, please make a feature request :)

#12 Updated by david percy over 5 years ago

I swear Andy said on the conference call yesterday that it does!
We even talked about support for the wmsGetFeatureInfo request!
I'm happy to make a feature request, but some clarification of what we were talking about on Wednesday would be useful first...

#13 Updated by Andy Dufilie over 5 years ago

I wasn't on the call yesterday but we could definitely implement more WFS/WMS functionality if someone really needs it (and we get funding for it of course).

#14 Updated by Chris Stefanich over 5 years ago

Percy, we do not have a custom WMS that is interacting with weave, but we do have Mapnik installed on our server.

#15 Updated by david percy over 5 years ago

well who was I talking to?
:-)
I'll try to find out from Jim Farham...

Andy Dufilie wrote:

I wasn't on the call yesterday but we could definitely implement more WFS/WMS functionality if someone really needs it (and we get funding for it of course).

#16 Updated by david percy over 5 years ago

Kyle, actually the BEAUTY of WMS is that it's an extremely standardized implementation! Look at the javascript code in OpenLayers to see how it's implemented in a very generic way. It's also the most commonly implemented interface to data across all GIS software, with the exception of shapefile :)

The different implementations that you run into are the TILING schemes, again a reference to all of the different layer types in Openlayers will help enumerate these. So ArcGIS server has one tiling scheme, OpenStreetMap has one, etc.

There's an OGC initiative that standardizes tiling, and it's supported by several open source products...

I'll go file that feature request now :)

Kyle Monico wrote:

Weave doesn't support custom WMS currently. Many WMS providers use their own formats for requesting and encoding tiles. If you want it, please make a feature request :)

#17 Updated by Andy Dufilie over 5 years ago

The spatial index was being recreated too many times and it was being created all at once instead of asynchronously. I've changed it so it is now asynchronous, and the interface is now more responsive. It was unresponsive previously because ActionScript is single-threaded. Overall the idea is to eliminate unnecessary duplicate or extra work and make long computations asynchronous.

Please try these demos again, which are now running the newest version:
Boston 100,000 Parcels
Michigan 300,000 Blocks

#18 Updated by Andy Dufilie over 5 years ago

  • Subject changed from Work towards supporting data sets containing 500,000 records to Work towards supporting data sets containing more records
  • Description updated (diff)

I'm changing the subject of this issue because the existing one is too vague. It doesn't mention anything about the content of the records (number of columns? data type?) or how many visualizations of what type would be used. These things make a big difference.

#19 Updated by Chris Stefanich over 5 years ago

That is a significant speed increase to show the geometries. Once we added an indicator to shade the map, however, it slowed way down:

http://weavetest.cridata.org:8787/cri-weave/weave.html?file=config/I_AM_TEST.xml

I downloaded and compiled this build yesterday around 3:30 or so.

Making good progress, thanks!

#20 Updated by Andy Dufilie over 5 years ago

Are you referring to the initial load, or panning around?

#21 Updated by Chris Stefanich over 5 years ago

The initial load is much slower when there are color bins. Panning was pretty responsive but kinda "flickery" for lack of a better term.

#22 Updated by Andy Dufilie about 5 years ago

  • Status changed from In Progress to Resolved

Closing this issue because progress has been made, and this issue is too vague. Separate issues will be created for other refactorings.

Also available in: Atom PDF