I'm working in an application (C#) that applies some readability formulas to a text, like Gunning-Fog, Precise SMOG, Flesh-Kincaid.
Now, I need to implement the Fry-based Grade formula in my pr开发者_StackOverflow社区ogram, I understand the formula's logic, pretty much you take 3 100-words samples and calculate the average on sentences per 100-words and syllables per 100-words, and then, you use a graph to plot the values.
Here is a more detailed explanation on how this formula works.
I already have the averages, but I have no idea on how can I tell my program to "go check the graph and plot the values and give me a level." I don't have to show the graph to the user, I only have to show him the level.
I was thinking that maybe I can have all the values in memory, divided into levels, for example:
Level 1: values whose sentence average are between 10.0 and 25+, and whose syllables average are between 108 and 132.
Level 2: values whose sentence average are between 7.7 and 10.0, and .... so on
But the problem is that so far, the only place in which I have found the values that define a level, are in the graph itself, and they aren't too much accurate, so if I apply the approach commented above, trying to take the values from the graph, my level estimations would be too much imprecise, thus, the Fry-based Grade will not be accurate.
So, maybe any of you knows about some place where I can find exact values for the different levels of the Fry-based Grade, or maybe any of you can help me think in a way to workaround this.
Thanks
Well, I'm not sure about this being the most efficient solution, neither the best one, but at least it does the job.
I gave up to the idea of having like a math formula to get the levels, maybe there is such a formula, but I couldn't find it.
So I took the Fry's graph, with all the levels, and I painted each level of a different color, them I loaded the image on my program using:
Bitmap image = new Bitmap(@"C:\FryGraph.png");
image.GetPixel(int x, int y);
As you can see, after loading the image I use the GetPixel method to get the color at the specified coordinates. I had to do some conversion, to get the equivalent pixels for a given value on the graph, since the scale of the graph is not the equivalent to the pixels of the image.
In the end, I compare the color returned by GetPixel to see which was the Fry readability level of the text.
I hope this may be of any help for someone who faces the same problem.
Cheers.
You simply need to determine the formula for the graph. That is, a formula that accepts the number of sentences and number of syllables, and returns the level.
If you can't find the formula, you can determine it yourself. Estimate the linear equation for each of the lines on the graph. Also estimate the 'out-of-bounds' areas in the 'long words' and 'long sentences' areas.
Now for each point, just determine the region in which it resides; which lines it is above and which lines it is below. This is fairly simple algebra, unfortunately this is the best link I can find to describe how to do that.
I have made a first pass at solving this that I thought I would share in case someone else is looking sometime in the future. I built on the answer above and created a generic list of linear equations that one can use to determine an approximate grade level. First had to correct the values to make it more linear. This does not take into account the invalid areas, but I may revisit that. The equation class:
public class GradeLineEquation
{
// using form y = mx+b
// or y=Slope(x)=yIntercept
public int GradeLevel { get; set; }
public float Slope { get; set; }
public float yIntercept { get; set; }
public float GetYGivenX(float x)
{
float result = 0;
result = (Slope * x) + yIntercept;
return result;
}
public GradeLineEquation(int gradelevel,float slope,float yintercept)
{
this.GradeLevel = gradelevel;
this.Slope = slope;
this.yIntercept = yintercept;
}
}
Here is the FryCalculator:
public class FryCalculator
{
//this class normalizes the plot on the Fry readability graph the same way a person would, by choosing points on the graph based on values even though
//the y-axis is non-linear and neither axis starts at 0. Just picking a relative point on each axis to plot the intercept of the zero and infinite scope lines
private List<GradeLineEquation> linedefs = new List<GradeLineEquation>();
public FryCalculator()
{
LoadLevelEquations();
}
private void LoadLevelEquations()
{
// load the estimated linear equations for each line with the
// grade level, Slope, and y-intercept
linedefs.Add(new NLPTest.GradeLineEquation(1, (float)0.5, (float)22.5));
linedefs.Add(new NLPTest.GradeLineEquation(2, (float)0.5, (float)20.5));
linedefs.Add(new NLPTest.GradeLineEquation(3, (float)0.6, (float)17.4));
linedefs.Add(new NLPTest.GradeLineEquation(4, (float)0.6, (float)15.4));
linedefs.Add(new NLPTest.GradeLineEquation(5, (float)0.625, (float)13.125));
linedefs.Add(new NLPTest.GradeLineEquation(6, (float)0.833, (float)7.333));
linedefs.Add(new NLPTest.GradeLineEquation(7, (float)1.05, (float)-1.15));
linedefs.Add(new NLPTest.GradeLineEquation(8, (float)1.25, (float)-8.75));
linedefs.Add(new NLPTest.GradeLineEquation(9, (float)1.75, (float)-24.25));
linedefs.Add(new NLPTest.GradeLineEquation(10, (float)2, (float)-35));
linedefs.Add(new NLPTest.GradeLineEquation(11, (float)2, (float)-40));
linedefs.Add(new NLPTest.GradeLineEquation(12, (float)2.5, (float)-58.5));
linedefs.Add(new NLPTest.GradeLineEquation(13, (float)3.5, (float)-93));
linedefs.Add(new NLPTest.GradeLineEquation(14, (float)5.5, (float)-163));
}
public int GetGradeLevel(float avgSylls,float avgSentences)
{
// first normalize the values given to cartesion positions on the graph
float x = NormalizeX(avgSylls);
float y = NormalizeY(avgSentences);
// given x find the first grade level equation that produces a lower y at that x
return linedefs.Find(a => a.GetYGivenX(x) < y).GradeLevel;
}
private float NormalizeY(float avgSentenceCount)
{
float result = 0;
int lower = -1;
int upper = -1;
// load the list of y axis line intervalse
List<double> intervals = new List<double> {2.0, 2.5, 3.0, 3.3, 3.5, 3.6, 3.7, 3.8, 4.0, 4.2, 4.3, 4.5, 4.8, 5.0, 5.2, 5.6, 5.9, 6.3, 6.7, 7.1, 7.7, 8.3, 9.1, 10.0, 11.1, 12.5, 14.3, 16.7, 20.0, 25.0 };
// find the first line lower or equal to the number we have
lower = intervals.FindLastIndex(a => ((double)avgSentenceCount) >= a);
// if we are not over the top or on the line grab the next higher line value
if(lower > -1 && lower < intervals.Count-1 && ((float) intervals[lower] != avgSentenceCount))
upper = lower + 1;
// set the integer portion of the respons
result = (float)lower;
// if we have an upper limit calculate the percentage above the lower line (to two decimal places) and add it to the result
if(upper != -1)
result += (float)Math.Round((((avgSentenceCount - intervals[lower])/(intervals[upper] - intervals[lower]))),2);
return result;
}
private float NormalizeX(float avgSyllableCount)
{
// the x axis is MUCH simpler. Subtract 108 and divide by 2 to get the x position relative to a 0 origin.
float result = (avgSyllableCount - 108) / 2;
return result;
}
}
精彩评论