What is the source for ArcMap's Jenks Optimization classification?

December 15 2008 | 10 comments
Categories: ArcGIS Methods, Symbology

Does ArcMap use the Jenks-Caspall or the Fisher-Jenks algorithm for classifying data into natural breaks. I did some support.esri.com research and found that ArcView 3.x appeared to have used the Fisher-Jenks, but ArcGIS Desktop only generically talked about Jenks Optimization without eluded to what algorithm it was using.

If someone could enlighten me, I would appreciate it.

Neither of the algorithms you referenced are robust algorithms in terms of handling a wide variety of data distributions in a failsafe fashion.  The ArcView 3x algorithm was based on Fisher-Jenks and for normally distributed data with a rich variety of values, our algorithm produced results that barely varied (5th+ decimal places).  In fact every person who has implemented the algorithm to operate on more than an ideal dataset produces similar small variations.

Less than ideal data in this case means, for example, a dataset where 70% of the values = 4.0, and the user asks for 6 classes.  Nothing Jenks, Fisher, or Caspall, ever intended their work to be used on, but something of a common occurrence today. ArcMap’s implementation is a bit more robust and efficient, and produces no more variance in output class break values than the ArcView implementation did for ideal (expected) data distributions.

So to answer your question, it’s the ESRI, ArcGIS implementation of the Fisher-Jenks algorithm.  Our only mistake in naming was calling it Natural Breaks—which means one of two things:  first and most commonly it is a manual process of placing class breaks on a histogram such that breaks fall in the troughs, and classes represent natural clumps in the data’s distribution.  The second and more loosely is a class of algorithms that are variance based class break generators.  These can produce considerably wider variations in class breaks than any of the specific implementations the Fisher-Jenks algorithm.

Actual formula posted by David Chevrier on Feb 24 2009 10:51AM
Since you use your own version of this, could you post the equation or even better, the actual code? I'm trying to use jenks with your flex api for a new layer.
Sorry posted by MappingCenter Team on Feb 27 2009 11:29AM
That's proprietary information, along the lines of a trade secret, and so corporate policy is that we do not provide it.
Flex API posted by David Chevrier on Mar 16 2009 12:14PM
Fair enough. Any chance of adding this ability into the next version of the Flex API then? I guess this is more of a feature request though.
Probably not, but... posted by MappingCenter Team on Mar 16 2009 3:19PM
Since the system is designed, for a number of reasons, to author content in ArcGIS Desktop or Engine, and then serve it via ArcGIS Server, the pat answer is no. That said, you'd be better off presenting your argument to the ArcGIS Server team, on their forum: http://resources.esri.com/arcgisserver/apis/flex/index.cfm?fa=forums
Initial date for Fisher-Jenks at ESRI posted by Terry Slocum on May 26 2009 10:07AM
Can anyone tell me approximately when a variant of the Fisher-Jenks algorithm was first used in an ESRI software product and which product that was?
A Trip in the posted by Charlie Frye on May 26 2009 4:12PM
I did that in the spring/summer of 1995 with Gene Vaatveit, who did the programming (a.k.a. the real work for that "fanschy schmancy algorithm" as he wrote it in the Avenue documentation) for that first incarnation of our "Natural Breaks" algorithm. We did that as part of the initial scope of work for ArcView 3.0. (Note that ArcView 3.0 was not released until the following year.) Beyond the journal articles, we consulted Dick Groop at Michigan State University (Geography Dept.), as he was one of the few people who had actually implemented this sort of algorithm.
natural breaks default? posted by Terry Slocum on May 27 2009 7:45AM
I thought that the natural breaks method was the default classification option in ArcView 3.x and ArcGIS. Is that correct, and if so, how long has it been the default?
Default, from its introduction in ArcView 3.0 posted by Charlie Frye on May 27 2009 10:59AM
The reason we worked on the then called "Natural Breaks" classification was specifically to introduce something that could be used as the default classification. The prior options had been quantile or equal interval, which for all the usual reasons are not ideal as a default in a general-use desktop GIS/mapping software package given that better methods were known.

In fact I had written a variance-based classification as an Avenue sample for ArcView 2.1. In demonstrating that, I would always show that the effect of a simple arbitrary class breaks method often produced a misleading picture to general audiences. So, from the very beginning we knew that the Natural Breaks family of classifications was much less likely to produce a misleading depiction.
Jenks Optimization code posted by Rosalyn Pereira on Feb 17 2011 2:22AM
You can find the pseudocode here:
http://resources.arcgis.com/content/kbase?fa=articleShow&d=26442
Thanks for the link posted by Aileen Buckley on Apr 15 2011 5:44PM