Tuesday, March 30, 2010

An introduction to R statistics

Recently, I spent some time learning the R environment. It take me a little while to Get-It. So I would like to describe the system in my way and hope it will help those brain that are wired like mine. There is no intention in covering the detail of R, but the basics.

The R environment data are objects that can have properties. But these objects do not have method. So, we can say they are more like a C structure than full featured objects. Also, R does not support the Object.Property or Object.method() syntax. Instead, the dot (.) is an allowable character for identifiers. Properties and methods are accessed through functions. So, the bottom line is R objects are like C structures. With this approach in mind, we can better understand the limitation of R and how it is constructed.

With this approach to objects, functions can be made to operate on multiple type of objects by knowing the type of the object. In R, properties for basic types are documented. The intrinsic properties are: mode, length and class. These properties can only be accessed through special functions: mode(), length() and class(). Other properties/attributes are accessed through attr(). The list of attributes can be viewed with attributes().

R supports the syntax of vector operation. For example, A*B can mean the multiplication of two vectors. This approach makes R an idea tool for expressing matrix operations and carrying out operations related to tabulated data, like those in the linear algebra and statistic survey.

Basic Data Type: Vector
The simplest R object type is the vector, which is an ordered list of components of the same kind. Component can be numeric, complex, character, logical, NA and others. Vectors have the mode property, where mode can be numeric, complex, character, logical and others. You use the mode() function to obtain access to the mode property. The other property of vectors is the length, or the number of components, and it can be accessed through the length() function. The names property is also supported. names property is a vector itself and gives each component a name. Components can be referred to by either integer indexes or their names.

Basic Data Type: Factor
Factor is an vector object with the levels property. Property levels is a vector of unique values of the original vector. This give those values an order.

R objects also have a property called class. The class of a vector is simply its mode. The class of a Factor is 'factor'. The class property can be accessed via class() function.

Basic Data Type: Array
Array is an vector object with dim property. The dim is a positive integer vector. The component of the dim vector specifies the size of each dimension. Matrix is an array of two dimensions. The mode of an array is the same as the mode of its component. The class of an array is 'array'. Array can be created by combining vectors or by setting a vector's dim property.

Basic Data Type: List
By combing objects of different type in an ordered list, we created a list object. List object can have a names property that is a vector of mode character and it gives each list-component a name. List objects have the class property set to 'list'.

Basic Data Type: data.frame
data.frame object is considered as an extension of list object with restrictions placed on the size of the list-component so that the data.frame resemble a table like structure with each column has the same number of values. These list-components can be vectors, factors, matrix or lists. data.frame have the class of 'data.frame'.

Useful Function/Operators
Environmental
getwd(), setwd(), objects(), ls(), library()

Constructors
Bgn:End (colon), c(), vector(), factor(), list(), data.frame(), matrix(), cbind(), rbind(), matrix()

Casting
as.vector(), as.factor() ...

Indexing
  • [] return the same type
  • Vctr[ NdxVctr ], Vctr[ NmsVctr ], Vctr[ LgcVctr ] ...
  • Mtrx[ NdxVctr1 ][ NdxVctr2 ] ...
  • [[ Ndx ]] == $ return the component.
[], [[]], $

Other
assign(), <-, ->, grep(), function(), tapply(), is.na(), is.vector(), summary(), names()
& (element-wise), | (element-wise), &&, ||

Control Structure
  • if (Exp1) Exp2 else Exp3
  • ifelse( LgcVctr, TrVctr, FlsVctr) return a vector with components from TrVctr and FlsVctr based on LgcVctr
  • for ( Ndx in Vctr) { Expr... }
  • while (Cndtn) { Expr ... }
  • break
  • Vrbl <- function (Arg1, Arg2, ...) { Expr ... }
The above provided enough info for the basic understanding of the R. For detail, please visit the R-Intro and the R-Reference.pdf.

Sunday, March 14, 2010

News feeds to China public

According to the news, Google could retreated from China as soon as the end of March, 2010. Recently, I read a news article that I feel it really worth reading for all the Chinese people including people lived in Taiwan and I begin to think how can I help feed news to Chinese lived inside China's Great Firewall.

I am sorry to say that "Bill Gates, I do not agree with you on that there are a lot of (technical) ways that Chinese people can reach the outside world". I believe they need help. One thing I hope can be done is to defeat the filtering of search result. Here are some of my idea.

1. Create sites with China-Communist Party permissible articles so that it can be indexed. We will then provide CAPTCHA and allow users to view the unfiltered result - possibly from Google.
2. Encourage everyone that care about this issue to setup various web sites so that it become extremely expensive for China-Communist to filter all these sites.

Comments welcome. I believe there are a lot of smart people that can provide even better ideas like dynamically register new sites from time to time.