High-Technically Correct by John M. Quick


Sunday, November 22, 2009

RSS Icon

Cover To Cover: My First Photography Publications

Red Cedar Review

Photography first became a major hobby of mine in the spring of 2008. In the fall, I came across the opportunity to submit my work for publication in the Red Cedar Review (RCR). RCR is a high-quality literary journal published each year at Michigan State University (MSU). Two of the photographs that I submitted were accepted for publication in volume 44 of RCR (2009). Much to my surprise as a first-time publishee, one of my pictures was selected to appear on the front cover! Both images, as they appeared in RCR, are displayed below.

The Offbeat

A second literary journal that I discovered at MSU was The Offbeat. Again, two of my photos were published, this time in volume 9 of The Offbeat (2009). One was placed on the back cover! Therefore, I have now had my work on both covers of a literary journal. Following are my images as they appeared in The Offbeat volume 9.

Future Publications

I have continued to enjoy taking pictures and submitting them for publication. Two of my photos have been accepted into volume 45 of RCR, which will be released in spring 2010. I also have a pending submission out to The Sustainability Review at Arizona State University. I am always seeking new journals, magazines, and other locations to place my work. If you know of any, please share them. You can also follow the photos that I publish on my Flickr account.

P.S. You may have noticed that two of the four photos presented in this article are of a bottle resting in the Red Cedar River. It turned out to be a popular bottle for me, considering that both pictures that I took of it were published. If you are interested in seeing it in color, take a look on Flickr.


Sunday, November 15, 2009

RSS Icon

Dreaming Across Multiple Dimensions

Last night, I had a dream unlike any that I had experienced before. While progressing through what began as a "standard" dream, I ended up going to sleep. During this dream within a dream, I visited a particular place. Once I awoke from this inner dream, the original one continued. At a certain point in this outer dream, I came across the same place that I visited in my inner dream. Here, I took special notice of the fact that I had dreamt of the place prior to ever witnessing it, especially since the inner dream connected directly to the plot of the outer dream. This was similar to having a real life experience of deja vu, in spite of being in a situation that has never been experienced before. Of course, the irony of the whole event is that it, too, was taking place within a dream and therefore was never experienced in reality.

I found this to be a very intriguing experience, especially in how it differs from a more common "dream within a dream" scenario. Many times in the past, I have awoken from a dream only to realize that I was still dreaming. This is a relatively common phenomena in which a person does not know that he or she is dreaming within a dream. Therefore, waking up often feels genuine, until one realizes that he or she is still within a dream. In contrast, I knowingly went to sleep within my dream, dreamt, and later awoke to continue the original dream (all before ultimately waking into reality).

Dream Structure Models

Traditional

This experience has lead me to ponder different structural relationships between dreams and the real world. For example, the diagram below represents what I would say is the traditional, or common, view. That is, there is a "real world" that a person lives in while awake and there is a "dream world" that a person enters while sleeping. Any sub-dreams that occur within a dream are still considered to take place within the dream world.

Hypothesized

On the other hand, the following diagram represents my hypothesized view of how the worlds might relate to each other in a different way. Both the real and dream worlds remain, however successive dreams within dreams are now viewed as unique "dimensions," rather than components of the original dream world. In this perspective, a person's dream version of himself can have his own dreams. Much like the real version of a person enters the dream world, so too can a dream person enter his own dream world. Although I have not experienced this phenomena beyond one additional level, it could foreseeably continue indefinitely.

Expanded Hypothesis

As a final plot twist, I created the expanded diagram below. This model begs the question as to whether our concept of reality is not just a sub-level of some larger reality, in the same way that our dream worlds are subsets of our reality. Could what we perceive as being "awake" be part of some dream of which we are not aware? Might worlds exist beyond our own concept of reality?

Universal Perspective

Building from my expanded hypothesis, I created what I think to be an even more accurate depiction of the relationships between "worlds," "dimensions," or "realities." It seems unlikely that things are as linear as presented in my earlier diagrams. Instead, I suspect that the universe is unfathomably, unpredictably, and unintelligibly complex, with an unlimited number of interactions taking place simultaneously. The following diagram is applicable whether the dimensions of dream are taken to occur within a single individual's mind or in a larger system of realities.

Conclusion

I have always found dreams to be an intriguing topic. They range from being insightful to hilarious to outright bizarre. Dreams also have the especially mystical quality of being beyond human understanding. As can be seen on Wikipedia, there enough dream theories to fill many years of intense study. Feel free to share your own interesting dream stories or overall perspectives on dreams in the comments section below.


Thursday, November 12, 2009

RSS Icon

R Tutorial Series: Scatterplots

11/13/09 Update: I have started a new blog to compile all of the articles in the R Tutorial Series. This should help anyone who is especially interested in keeping track of and revisiting these items. The R Tutorial Series blog is available at http://www.rtutorialseries.blogspot.com

A scatterplot is a useful way to visualize the relationship between two variables. Similar to correlations, scatterplots are often used to make initial diagnoses before any statistical analyses are conducted. This tutorial will explore the ways in which R can be used to create scatterplots.

Tutorial Files

Before we start, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains pre and post test scores for 66 subjects on a series of reading comprehension tests (Moore & McCabe, 1989). Note that all code samples in this tutorial assume that this data has already been read into an R variable and has been attached.

Plotting Two Variables

The simplest way to create a scatterplot is to directly graph two variables using the default settings. In R, this can be accomplished with the plot(XVAR, YVAR) function, where XVAR is the variable to plot along the x-axis and YVAR is the variable to plot along the y-axis. Suppose that we want to get a picture of the relationship between pretest 1 (PRE1) and posttest 1 (POST1). The following example demonstrates how to use the plot(XVAR, YVAR) function to visualize this relationship.

  1. #create a scatterplot of Y on X using plot(XVAR, YVAR)
  2. #what does the relationship between pretest 1 and posttest 1 look like?
  3. plot(PRE1, POST1)

The output of the preceding function is pictured below.

Plotting All Variables

When beginning to analyze a dataset, researchers often want to get a complete picture of all relationships, rather than just a single one. Conveniently, the plot() function can also be run on an entire set of data. The format for this operation is plot(DATAVAR), where DATAVAR is the name of the R variable containing the data. Suppose now that our interest is in visualizing all of the scatterplots at once, in order to diagnose the various relationships present in our data. The following example demonstrates how to use the plot(DATAVAR) function.

  1. #create scatterplots of all variables using plot(DATAVAR)
  2. #what do all of the relationships in the data look like?
  3. plot(datavar)

The output of the preceding function is pictured below.

Note that the image above has been resized to fit on this page. In the R Quartz Window, the scatterplots could be made much larger for easier viewing.

Custom Plotting

Additional Plot() Arguments

Up to this point, we have been using the default values for all of our scatterplots' elements. However, R also allows for the customization of scatterplots. In addition to x and y axis variables, the plot() function also accepts the following arguments ("The Default Scatterplot Function", n.d.).

  • main: the title for the plot (displayed at the top)
  • sub: the subtitle for the plot (displayed at the bottom)
  • xlim: the x-axis scale; uses the format c(min, max); automatically determined by default
  • ylim: the y-axis scale; uses the format c(min, max); automatically determined by default
  • xlab: the x-axis title
  • ylab: the y-axis title
  • Even more arguments are accepted by the plot() function. Take a look at the referenced page if you wish to explore further options.

Now let's recreate the original plot depicting the relationship between pretest 1 and posttest 1 with more detailed and meaningful parameters.

  1. #create a detailed scatterplot of Y on X incorporating the optional arguments of the plot() function
  2. #set axis scales for x and y to range between 0 and 20
  3. #set main title and subtitle
  4. #set x and y axis labels
  5. plot(PRE1, POST1, xlim = c(0, 20), ylim = c(0, 20), main = "Posttest 1 on Pretest 1", sub = "A Scattered Tale", xlab = "Pretest 1 Score", ylab = "Posttest 1 Score")

The output of the preceding function is pictured below.

Advanced Plotting

There are numerous graphical arguments available to functions in R. In this tutorial, just a few of the common aesthetic options will be addressed below ("Set or Query Graphical Parameters", n.d.).

  • col: determines the colors used for points and lines; accepts character strings of color names (i.e. "red", "green", etc.)
  • pch: the type of point to use (i.e. circle, square, triangle, etc.); accepts values 0-25 for symbols and 32-255 for characters
  • cex: the amount to scale the size of points; accepts a numeric value; default is 1
  • lty: defines the line type; accepts various character strings (i.e. "solid", "dashed", "dotted", etc.)
  • lwd: defines the line width; accepts a positive number; default is 1

Even more arguments are accepted by the plot() function. Take a look at the referenced page if you wish to explore further options.

Now let's recreate the plot of posttest 1 on pretest 1 yet again, but this time with the inclusion of customized aesthetic parameters.

  1. #create a scatterplot of Y on X incorporating the custom aesthetic parameters of the plot() function
  2. #set point colors to dark green, red, and orange
  3. #set point markers to circle, square, and diamond
  4. #set point size to three times the default
  5. #set lines to be solid and three times the default thickness
  6. plot(PRE1, POST1, xlim = c(0, 20), ylim = c(0, 20), main = "Posttest 1 on Pretest 1", sub = "A Scattered Tale", xlab = "Pretest 1 Score", ylab = "Posttest 1 Score", col = c("dark green", "red", "orange"), pch = c(21, 22, 23), cex = 3, lty = "solid", lwd = 3)

The output of the preceding function is pictured below.

Note that the c() function is used for a number of the parameters in the plot function above. This allows one to define multiple values as a "vector" that can be fed into a single argument. For example, if one wanted to use only a single line color, then col = "red" would be acceptable. However, to use multiple colors, all items must be placed into a vector such as col = c("red", "green", "blue"). Without using a vector for multiple colors, as in col = "red", "green", "blue", an error would occur because the colors would be treated as separate arguments rather than a single entity.

Complete Plot Examples

To see a complete example of how scatterplots can be created in R, please download the plot examples (.txt) file.

Even More Visualizations

R has much more sophisticated graphic capabilities than have been demonstrated in this tutorial. In fact, opportunities exist to make very complex and unique visuals. To see examples of the kinds of charts that can be generated with R, I recommend that you visit the R Graph Gallery (François, 2006).

References

François, R. (2006). R graph gallery: Enhance your data visualization with R. Retrieved November 11, 2009 from http://addictedtor.free.fr/graphiques

Moore, D., and McCabe, G. (1989). Introduction to the practice of statistics [Data File]. Retrieved October 27, 2009 from http://lib.stat.cmu.edu/DASL/Datafiles/ReadingTestScores.html

Set or Query Graphical Parameters. (n.d.). Retrieved November 11, 2009 from http://sekhon.berkeley.edu/graphics/html/par.html

The Default Scatterplot Function. (n.d.). Retrieved November 11, 2009 from http://sekhon.berkeley.edu/graphics/html/plotdefault.html


Friday, November 6, 2009

RSS Icon

R Tutorial Series: Zero-Order Correlations

One of the most common and basic techniques for analyzing the relationships between variables is zero-order correlation. This tutorial will explore the ways in which R can be used to employ this method.

Tutorial Files

Before we start, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains pre and post test scores for 66 subjects on a series of reading comprehension tests (Moore & McCabe, 1989). Note that all code samples in this tutorial assume that this data has already been read into an R variable and has been attached.

Correlation Between Two Variables

The most fundamental way to calculate correlations is to directly operate on two variables. In R, this can be done using the cor() function. The cor() function accepts the following arguments ("Correlation, Variance...", n.d.).

  • x: the first variable to correlate
  • y: the second variable to correlate
  • use (optional): determines how missing values are handled; accepts "all.obs", "complete.obs", or "pairwise.complete.obs"
  • method (optional): determines the statistical method used; accepts c("pearson"), c("kendall"), or c("spearman")

In most cases, x and y are the only arguments that you will use when running the cor() function. The basic format for calculating a correlation is cor(VAR1, VAR2), where VAR1 and VAR2 are the variables that you would like to correlate.

cor(VAR1, VAR2) Example

Suppose that our research question is: "How does a subject's pretest 1 score relate to his or her posttest 1 score?" The following example demonstrates how to use the cor() function to calculate the correlation between pretest 1 (PRE1) and posttest 1 (POST1).

  1. >#use cor(VAR1, VAR2) to calculate the correlation between variable 1 and variable 2
  2. > cor(PRE1, POST1)
  3. [1] 0.5659026

Correlations Between Multiple Variables

When beginning to analyze a dataset, researchers often want to get a complete picture of all correlations, rather than just a single one. Conveniently, the cor() function can also be run on an entire set of data. The format for this operation is cor(DATAVAR), where DATAVAR is the name of the R variable containing the data.

cor(DATAVAR) Example

Suppose now that our research question is: "How do all of the test scores in the dataset relate to each other?" The following example demonstrates how to use the cor() function to calculate all of the correlations in a dataset.

  1. >#use cor(DATAVAR) to get the correlations between all variables
  2. > cor(datavar)

The output of the preceding function is pictured below.

Complete Correlational Analysis

To see a complete example of how correlational analysis can be conducted in R, please download the correlational analysis example (.txt) file.

References

Correlation, Variance and Covariance (Matrices). (n.d.). Retrieved October, 27, 2009 from http://sekhon.berkeley.edu/stats/html/cor.html

Moore, D., and McCabe, G. (1989). Introduction to the practice of statistics [Data File]. Retrieved October, 27, 2009 from http://lib.stat.cmu.edu/DASL/Datafiles/ReadingTestScores.html


Sunday, November 1, 2009

RSS Icon

R Tutorial Series: Summary and Descriptive Statistics

Summary (or descriptive) statistics are the first figures used to represent nearly every dataset. They also form the foundation for much more complicated computations and analyses. Thus, in spite of being composed of simple methods, they are essential to the analysis process. This tutorial will explore the ways in which R can be used to calculate summary statistics, including the mean, standard deviation, range, and percentiles. Also introduced is the summary function, which is one of the most useful tools in the R set of commands.

Tutorial Files

Before we start, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains hypothetical age and income data for 20 subjects. Note that all code samples in this tutorial assume that this data has already been read into an R variable and has been attached.

Mean

In R, a mean can be calculated on an isolated variable via the mean(VAR) command, where VAR is the name of the variable whose mean you wish to compute. Alternatively, a mean can be calculated for each of the variables in a dataset by using the mean(DATAVAR) command, where DATAVAR is the name of the variable containing the data. The code sample below demonstrates both uses of the mean function.

  1. > #calculate the mean of a variable with mean(VAR)
  2. > #what is the mean Age in the sample?
  3. > mean(Age)
  4. [1] 32.3
  5. > #calculate the mean of all variables in a dataset with mean(DATAVAR)
  6. > #what is the mean of each variable in the dataset?
  7. > mean(dataset)
  8. Age...... Income
  9. 32.3..... 34000.0

Standard Deviation

Within R, standard deviations are calculated in the same way as means. The standard deviation of a single variable can be computed with the sd(VAR) command, where VAR is the name of the variable whose standard deviation you wish to retrieve. Similarly, a standard deviation can be calculated for each of the variables in a dataset by using the sd(DATAVAR) command, where DATAVAR is the name of the variable containing the data. The code sample below demonstrates both uses of the standard deviation function.

  1. > #calculate the standard deviation of a variable with sd(VAR)
  2. > #what is the standard deviation of Age in the sample?
  3. > sd(Age)
  4. [1] 19.45602
  5. > #calculate the standard deviation of all variables in a dataset with sd(DATAVAR)
  6. > #what is the standard deviation of each variable in the dataset?
  7. > sd(dataset)
  8. Age.............. Income
  9. 19.45602.... 32306.10175

Range

Minimum and Maximum

Keeping with the pattern, a minimum can be computed on a single variable using the min(VAR) command. The maximum, via max(VAR), operates identically. However, in contrast to the mean and standard deviation functions, min(DATAVAR) or max(DATAVAR) will retrieve the minimum or maximum value from the entire dataset, not from each individual variable. Therefore, it is recommended that minimums and maximums be calculated on individual variables, rather than entire datasets, in order to produce more useful information. The sample code below demonstrates the use of the min and max functions.

  1. > #calculate the min of a variable with min(VAR)
  2. > #what is the minimum age found in the sample?
  3. > min(Age)
  4. [1] 5
  5. > #calculate the max of a variable with max(VAR)
  6. > #what is the maximum age found in the sample?
  7. > max(Age)
  8. [1] 70

Range

The range of a particular variable, that is, its maximum and minimum, can be retrieved using the range(VAR) command. As with the min and max functions, using range(DATAVAR) is not very useful, since it considers the entire dataset, rather than each individual variable. Consequently, it is recommended that ranges also be computed on individual variables. This operation is demonstrated in the following code sample.

  1. > #calculate the range of a variable with range(VAR)
  2. > #what range of age values are found in the sample?
  3. > range(Age)
  4. [1] 5....70

Percentiles

Values from Percentiles (Quantiles)

Given a dataset and a desired percentile, a corresponding value can be found using the quantile(VAR, c(PROB1, PROB2,…)) command. Here, VAR refers to the variable name and PROB1, PROB2, etc., relate to probability values. The probabilities must be between 0 and 1, therefore making them equivalent to decimal versions of the desired percentiles (i.e. 50% = 0.5). The following example shows how this function can be used to find the data value that corresponds to a desired percentile.

  1. > #calculate desired percentile values using quantile(VAR, c(PROB1, PROB2,...))
  2. > #what are the 25th and 75th percentiles for age in the sample?
  3. > quantile(Age, c(0.25, 0.75))
  4. 25%....... 75%
  5. 17.75..... 44.25

Note that quantile(VAR) command can also be used. When probabilities are not specified, the function will default to computing the 0, 25, 50, 75, and 100 percentile values, as shown in the following example.

  1. > #calculate the default percentile values using quantile(VAR)
  2. > #what are the 0, 25, 50, 75, and 100 percentiles for age in the sample?
  3. > quantile(Age)
  4. 0%...... 25%...... 50%...... 75%...... 100%
  5. 5.00... 17.75...... 30.00... 44.25..... 70.00

Percentiles from Values (Percentile Rank)

In the opposite situation, where a percentile rank corresponding to a given value is needed, one has to devise a custom method. To begin, consider the steps involved in calculating a percentile rank.

  1. count the number of data points that are at or below the given value
  2. divide by the total number of data points
  3. multiply by 100

From the preceding steps, the formula for calculating a percentile rank can be derived: percentile rank = length(VAR[VAR <= VAL]) / length(VAR) * 100, where VAR is the name of the variable and VAL is the given value. This formula makes use of the length function in two variations. The first, length(VAR[VAR <= VAL]), counts the number of data points in a variable that are below the given value. Note that the "<=" operator can be replaced with other combinations of the <, >, and = operators, supposing that the function were to be applied to different scenarios. The second, length(VAR), counts the total number of data points in the variable. Together, they accomplish steps one and two of the percentile rank computation process. The final step is to multiply the result of the division by 100 to transform the decimal value into a percentage. A sample percentile rank calculation is demonstrated below.

  1. > #calculate the percentile rank for a given value using the custom formula: length(VAR[VAR <>
  2. > #in the sample, an age of 45 is at what percentile rank?
  3. > length(Age[Age <= 45]) / length(Age) * 100
  4. [1] 75

Summary

A very useful multipurpose function in R is summary(X), where X can be one of any number of objects, including datasets, variables, and linear models, just to name a few. When used, the command provides summary data related to the individual object that was fed into it. Thus, the summary function has different outputs depending on what kind of object it takes as an argument. Besides being widely applicable, this method is valuable because it often provides exactly what is needed in terms of summary statistics. A couple examples of how summary(X) can be used are displayed in the following code sample. I encourage you to use the summary command often when exploring ways to analyze your data in R. This function will be revisited throughout the R Tutorial Series.

  1. > #summarize a variable with summary(VAR)
  2. > summary(Age)

The output of the preceding summary is pictured below.

  1. > #summarize a dataset with summary(DATAVAR)
  2. > summary(dataset)

The output of the preceding summary is pictured below.

Complete Summary Statistics Analysis

To see a complete example of how summary statistics can be used to analyze data in R, please download the summary statistics analysis example (.txt) file.

Up Next: Zero-Order Correlations

Thank you for participating in the Summary and Descriptive Statistics tutorial. I hope that it has been useful to your work with R and statistics. Please let me know of any feedback, questions, or requests that you have in the comments section of this article. Our next guide will be on the topic of Zero-Order Correlations.