The study of quasars is of great importance to the formation and evolution of galaxies and the early history of
the universe, especially high redshift quasars. With the development and employment of large sky spectroscopic
survey projects (e.g. 2dF, SDSS), the number of quasars increases to more than 200,000. For improving the
efficiency of high-cost telescopes, careful selecting observational targets is necessary. Therefore various targeting
quasar algorithms are used and developed based on different data. We review them in detail. Some statistical
approaches are based on photometric color, variability, UV-excess, BRX, radio properties, color-color cut and so
on. Automated methods include support vector machines (SVMs), kernel density estimation (KDE), artificial
neural networks (ANNs), extreme-deconvolution method, probabilistic principal surfaces (PPS) and the negative
entropy clustering (NEC), etc. In addition, we touch upon some quasar candidate catalogues created by different
algorithms.
Based on photometric and spectroscopic data of quasars from SDSS DR7 and UKDISS DR7, support vector
machines (SVM) is applied to predict photometric redshifts of quasars. Different input patterns are tried and
the best pattern is presented. Comparing the results using optical data with that using optical and infrared
data, the experimental results show that the accuracy improves with data from more bands. In addition, the
quasar sample is firstly clustered into two groups by one-class SVM, then the photometric redshifts of the two
groups are separately estimated by means of SVM. The results based on the whole sample and the combined
results from the two groups are comparable.
A photometric redshift provides an estimate for the distance of an astronomical object, such as a galaxy or
quasar, which is a powerful statistical tool for studies of evolutionary properties of galaxies, in particular of
faint galaxies, since their spectroscopic data are hard or impossible to obtain. At present, there are amounts
of methods to estimate photometric redshifts of galaxies and quasars. These methods are grouped into two
kinds: template fitting methods and empirical methods. The commonly used techniques about these two kinds
are narrated. The difference of approaches between quasars and galaxies are pointed out. The methods show
superiorities in galaxies and maybe show poor performance in quasars. Template-fitting methods and empirical
methods have their pros and cons.
KEYWORDS: Databases, Astronomy, Data integration, Computing systems, Observatories, Interfaces, Java, System integration, Data processing, Visualization
Astronomy steps into a fullwave and data-avalanche era. Astronomical data is measured by Terabyte, even
Petabyte. How to save, manage, analyze so massive data is an important issue in astronomy. In order to let
astronomers free of the data processing burden and expert in science, various valuable and convenient tools
(e.g. Aladin, VOSpec, VOPlot) are developed by VO projects. To suit this requirement, we develop a toolkit to
realize automated database creation, automated database index creation and cross-match. The toolkit provides
a good interface for users to apply. The cross-match task may be implemented between local databases, remote
databases or local database and remote database. The large-scale cross-match is also easily achieved. Moreover,
the speed for large-scale cross-match is rather satisfactory.
K-Nearest Neighbor (kNN) algorithm is one of the simplest and most flexible and effective classification algorithms,
which has been widely used in many fields. Using the multi-band samples extracted from large surveys
of SDSS DR7 and UKIDSS DR3, we investigate the performance of kNN with different combinations of colors to
select quasar candidates. The color histograms of quasars and stars is helpful to select the optimal input pattern
for the classifier of kNN. The best input pattern is (u-g, g-r, r-i, i-z, z-Y, Y-J, J-H, H-K, Y-K, g-z).
In our case, the performance of kNN is assessed by different performance metrics, which indicate kNN has rather
high performance for discriminating quasars from stars. As a result, kNN is an applicable and effective method
to select quasar candidates for large sky survey projects.
Facing very large and frequently high dimensional data in astronomy, effectiveness and efficiency of algorithms
are always the hot issue. Excellent algorithms must avoid the curse of dimensionality and simultaneously should
be computationally efficient. Adopting survey data from optical bands (SDSS, USNO-B1.0) and radio band
(FIRST), we investigate feature weighting and feature selection by means of random forest algorithm. Then
we employ a kd-tree based k-nearest neighbor method (KD-KNN) to discriminate quasars from stars. Then
the performance of this approach based on all features, weighted features and selected features are compared.
The experimental result shows that the accuracy improves when using weighted features or selected features.
KD-KNN is a quite easy and efficient approach to nonparametric classification. Obviously KD-KNN combined
with random forests is more effective to separate quasars from stars with multi-wavelength data.
We investigate two methods: kernel regression and nearest neighbor algorithm for photometric redshift estimation
with the quasar samples from SDSS (the Sloan Digital Sky Survey) and UKIDSS (the UKIRT Infrared Deep Sky
Survey) databases. Both kernel regression and nearest neighbor algorithm belong to the family of instance-based
learning algorithms, which store all the training examples and "delay learning" until prediction time. The major
difference between the two algorithms is that kernel regression is a weighted average of spectral redshifts of the
neighbors for a query point while nearest neighbor algorithm utilizes the spectral redshift of the nearest neighbor
for a query point. Each algorithm has its own advantage and disadvantage. Our experimental results show that
kernel regression obtains more accurate predicting results, and nearest neighbor algorithm shows its superiority
especially for more thinly spread data, e.g. high redshift quasars.
The k Nearest Neighbor (kNN) algorithm is an effective classification approach in the statistical methods of
pattern recognition. But it could be a rather time-consuming approach when applied on massive data, especially
facing large survey projects in astronomy. NVIDIA CUDA is a general purpose parallel computing architecture
that leverages the parallel compute engine in NVIDIA graphics processing units (GPUs) to solve many complex
computational problems in a fraction of the time required on a CPU. In this paper, we implement a CUDAbased
kNN algorithm, and compare its performance with CPU-only kNN algorithm using single-precision and
double-precision datatype on classifying celestial objects. The results demonstrate that CUDA can speedup
kNN algorithm effectively and could be useful in astronomical applications.
Based on survey databases from different bands, we firstly employed random forest approach for feature selection
and feature weighting, and investigated support vector machines (SVMs) to classify quasars from stars.
Two sets of data were used, one from SDSS, USNO-B1.0 and FIRST (short for FIRST sample), and another
from SDSS, USNO-B1.0 and ROSAT (short for ROSAT sample). The classification results with different data
were compared. Moreover the SVM performance with different features was presented. The experimental result
showed that the accuracy with FIRST sample was superior to that with ROSAT sample, in addition, when
compared to the result with original features, the performance using selected features improved and that using
weighted features decreased. Therefore we consider that while SVMs is applied for classification, feature
selection is necessary since this not only improves the performance, but also reduces the dimensionalities. The
good performance of SVMs indicates that SVMs is an effective method to preselect quasar candidates from
multiwavelength data.
We present a comparative study of implementation of supervised classification algorithms on classification of
celestial objects. Three different algorithms including Linear Discriminant Analysis (LDA), K-Dimensional Tree
(KD-tree), Support Vector Machines (SVMs) are used for classification of pointed sources from the Sloan Digital
Sky Survey (SDSS) Data Release Seven. All of them have been applied and tested on the SDSS photometric
data which are filtered by stringent conditions to make them play the best performance. Each of six performance
metrics of SVMs can achieve very high performance (99.00%). The performances of KD-tree are also very good
since six metrics are over 97.00%. Although five metrics are more than 90.00%, the performances of LDA
are relatively poor because the accuracy of positive prediction only reaches 85.98%. Moreover, we discuss what
input pattern is the best combination of different parameters for the effectiveness of these methods, respectively.
We introduce an automated method called Support Vector Machines (SVMs) for quasar selection in order to
compile an input catalogue for the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST)
and improve the efficiency of its 4000 fibers. The data are adopted from the Sloan Digital Sky Survey (SDSS)
Data Release Seven (DR7) which is the latest world release now. We carefully study the discrimination of
quasars from stars by finding the hyperplane in high-dimensional space of colors with different combinations
of model parameters in SVMs and give a clear way to find the optimal combination (C-+ = 2, C+- = 2,
kernel = RBF, gamma = 3.2). Furthermore, we investigate the performances of SVMs for the sake of
predicting the photometric redshifts of quasar candidates and get optimal model parameters of (w = 0.001,
C-+ = 1, C+- = 2, kernel = RBF, gamma = 7.5) for SVMs. Finally, the experimental results show that the
precision and the recall of SVMs for separating quasars from stars both can be over 95%. Using the optimal
model parameters, we estimate the photometric redshifts of 39353 identified quasars, and find that 72.99% of
them are consistent with the spectroscopic redshifts within |▵z| < 0.2. This approach is effective and applicable
for our problem.
We employ k-nearest neighbor algorithm (KNN) for photometric redshift measurement of quasars with the Fifth
Data Release (DR5) of the Sloan Digital Sky Survey (SDSS). KNN is an instance learning algorithm where
the result of new instance query is predicted based on the closest training samples. The regressor do not use
any model to fit and only based on memory. Given a query quasar, we find the known quasars or (training
points) closest to the query point, whose redshift value is simply assigned to be the average of the values of its k
nearest neighbors. Three kinds of different colors (PSF, Model or Fiber) and spectral redshifts are used as input
parameters, separatively. The combination of the three kinds of colors is also taken as input. The experimental
results indicate that the best input pattern is PSF + Model + Fiber colors in all experiments. With this pattern,
59.24%, 77.34% and 84.68% of photometric redshifts are obtained within ▵z < 0.1, 0.2 and 0.3, respectively. If
only using one kind of colors as input, the model colors achieve the best performance. However, when using two
kinds of colors, the best result is achieved by PSF + Fiber colors. In addition, nearest neighbor method (k = 1)
shows its superiority compared to KNN (k ≠ 1) for the given sample.
MAPPER Lithography is developing a maskless lithography technology based on massively-parallel electron-beam
writing in combination with high speed optical data transport for switching the electron beams. With 13,000 electron
beams each delivering a current of 13nA on the wafer, a throughput of 10 wph is realized for 22nm node lithography.
By clustering several of these systems together high throughputs can be realized in a small footprint. This enables a
highly cost-competitive alternative to double patterning and EUV.
The most mature and reliable electron source currently available that combines a high brightness, a high emission current
and uniform emission is the dispenser cathode. For this electron source a reduced brightness of 106 A/m2SrV has been measured, with no restrictions on emission current. With this brightness however it is possible to realize a beam current
of 0.3nA (@ 25nm spotsize), which is almost a factor 50 lower than the 13nA that is required for 10 wph.
Three methods can be distinguished to increase the throughput:
1. Use an electron source with a 50× higher brightness
2. Increase the number of beams and lenses 50×
3. Patterned beams: Image multiple sub-beams with each projection lens
MAPPER has selected option 3) 'Patterned beams' as the method to increase the beam current to 13nA. This because an
electron source with a 50x higher brightness is simply not available at this time, and increasing the number of beams and
lenses 50× leads to undesirable engineering issues.
During the past years MAPPER has been developing the concept of 'Patterned beams'. By imaging 7×7 sub-beams per
projection lens the beam current is increased to the required 13nA level. This technique will also be used to maintain
throughput at 10 wph for smaller technology nodes by further increasing the number of sub-beams per projection lens.
In this paper we will describe the electron optical design used to image these multiple sub-beams per lens, as well as
experimental demonstration of this electron optical configuration. Also the writing strategy will be discussed, as well as
the first patterning results. One of the key components for 'Patterned beams' is the beam blanker array, since each subbeam
must be switched on and off individually. The design of the blanker deflectors, the circuitry, as well as experimental results of the blanker array will be shown. Finally the roadmap to further technology nodes will be discussed.
With the large-scale multicolor photometry and fiber-based spectroscopy projects carried out, millions of uniform
samples are available to the astronomers. Based on this situation, we have developed an automatic system to
estimate photometric redshifts for both galaxies and quasars. In this paper we give an exhaustive introduction
of the system. We first describe a series of methods integrated in this system, such as template fitting, color-magnitude-redshift relation, polynomial regression, support vector machines and kernel regression. The merits
and demerits of these approaches have been indicated. Therefore, users can choose some suitable algorithm to
estimate photometric redshifts according to data characteristics and science requirements. Then, we present
a case study to illustrate how the system works. In order to build a more robust system of increasing the
accuracy and speed of photometric redshift estimation, we pay special attention to algorithm choice and data
preparation. From the user's viewpoint, an easy used interface will be provided. Finally, we point out the
promising techniques of measuring photometric redshifts and the application prospects of this system. In the
future, the system will become an essential tool for automatedly determining photometric redshifts in the study
of the large-scale structure of the Universe and the formation and evolution of galaxies.
The Sloan Digital Sky Survey (SDSS) is an ambitious photometry and spectra project, providing huge and
abundant samples for photometric redshift estimation. We employ polynomial regression to estimate photometric
redshifts using 330,000 galaxies with known spectroscopic redshifts from SDSS Release Four spectroscopic catalog,
and compare three polynomial regressionmethods, i.e. linear regression, quadratic regression and cubic regression
with different samples. This technique gives absolute convergence in a finite number of steps, represents better
fit with fewer coefficients and yields the result as a mathematical expression. This method is much easier to
use and understand than other empirical methods for astronomers. Our result indicates that equally or more
powerful accuracy is provided, moreover, the best r.m.s. dispersion of this approach is 0.0256. In addition, the
comparison between our results with other works is addressed.
KEYWORDS: Astronomy, Knowledge discovery, Data mining, Galactic astronomy, Statistical analysis, Databases, Data modeling, Stars, Data processing, Data analysis
With the construction and development of ground-based and space-based observatories, astronomical data
amount to Terascale, even Petascale. How to extract knowledge from so huge data volume by automated methods
is a big challenge for astronomers. Under this situation, many researchers have studied various approaches
and developed different softwares to solve this issue. According to the special task of data mining, we need
to select an appropriate technique suiting the requirement of data characteristics. Moreover all algorithms
have their own pros and cons. We introduce the characteristics of astronomical data, present the taxonomy
of knowledge discovery, and describe the functionalities of knowledge discovery in detail. Then the methods
of knowledge discovery are touched upon. Finally the successful applications of data mining techniques in astronomy
are summarized and reviewed. Facing data avalanche in astronomy, knowledge discovery in databases
(KDD) shows its superiority.
Data avalanche faced in astronomy, astronomical data covers from radio, infrared, optical, X-ray, even gamma
ray band. Astronomy enters an all sky-survey era. Transforming data into knowledge depends on data mining
techniques. How to effectively and efficiently extract knowledge from databases is an important issue. Especially
mining knowledge from different bands or multiband is of great significance. In this paper, we design a system
which includes four fundamental blocks: the first is used to create databases; the second for cross-matching
objects from different bands, the third for mining knowledge from the large data volume and the last one for
final result evaluation. The functionalities of the four blocks are described. The cross-match results are divided,
and the analysis mode for each of them is touched upon. Moreover the schemes of classification, regression,
clustering analysis and outlier detection are demonstrated.
The federation of data from distributed locations, different archives and different wavelengths can lead to new discoveries. Moreover, it is an important part of functions of the Virtual Observatory. We review the technical challenges involved in this issue, and develop a system which majors in providing a robust framework to efficiently extract the data from different sources into a science-grade data for the convenient use of astronomers. The system consists of several tasks wrapped together into an integrated framework. The tasks include: the automated creation of database, the rapid query of catalogs, cross-match query and the visualization of the queried results. Especially for cross-matching service, many choices are provided for users, such as one-to-one entry, one-to-many
entry, one-to-none entry, none-to-one entry. Meanwhile, the probability of cross-matching is given. In addition, users may select the attributes and the range of attributes according to their requirements. We will further improve the system in various respects according to the standards of the IVOA.
The advantages of being able to accurately measure redshift with photometric data are of great importance
for studying cosmology, large scale structure of the Universe, determination of fundamental astrophysical quantities
and so on, because photometric redshifts may provide approximate distances to the enormous set of
objects. At present various algorithms for photometric redshifts have been investigated. This is induced us
to develop a software platform that integrates different algorithms of estimating photometric redshifts, such
as color-magnitude-redshift (CMR), Support Vector Machines (SVMs), HyperZ and Artificial Neural Networks
(ANNs). The requirements of the software platform, architectural issues are addressed and its framework design
implemented are discussed. It provides a user-friendly interface, by which users can choose the method they
like, upload their own data, and then get their needed result by clicking a mouse. This framework is flexible and
extensible enough to measure photometric redshifts.
The important step of data preprocessing of data mining is feature
selection. Feature selection is used to improve the performance of
data mining algorithms by removing the irrelevant and redundant
features. By positional cross-identification, the multi-wavelength
data of 1656 active galactic nuclei (AGNs), 3718 stars, and 173
galaxies are obtained from optical (USNO-A2.0), X-ray (ROSAT), and
infrared (Two Micron All- Sky Survey) bands. In this paper we
applied a kind of filter approach named ReliefF to select features
from the multi-wavelength data. Then we put forward the naive
Bayes classifier to classify the objects with the feature subsets
and compare the results with and without feature selection, and
those with and without adding weights to features. The result
shows that the naive Bayes classifier based on ReliefF algorithms
is robust and efficient to preselect AGN candidates.
Astronomical data sets have experienced an unprecedented and
continuing growth in the volume, quality, and complexity over the
past few years, driven by the advances in telescope, detector, and
computer technology. Like many other fields, astronomy has become
a very data rich science. Information content measured in multiple
Terabytes, and even larger, multi Petabyte data sets are on the
horizon. To cope with this data flood, Virtual Observatory (VO)
federates data archives and services representing a new
information infrastructure for astronomy of the 21st century and
provides the platform to science discovery. Data mining promises
to both make the scientific utilization of these data sets more
effective and more complete, and to open completely new avenues of
astronomical research. Technological problems range from the
issues of database design and federation, to data mining and
advanced visualization, leading to a new toolkit for astronomical
research. This is similar to challenges encountered in other data
intensive fields today. Outlier detection is of great importance,
as one of four knowledge discovery tasks. The identification of
outliers can often lead to the discovery of truly unexpected
knowledge in various fields. Especially in astronomy, the great
interest of astronomers is to discover unusual, rare or unknown
types of astronomical objects or phenomena. The outlier detection
approaches in large datasets correctly meet the need of
astronomers. In this paper we provide an overview of some
techniques for automated identification of outliers in
multivariate data. Outliers often provide useful information.
Their identification is important not only for improving the
analysis but also for indicating anomalies which may require
further investigation. The technique may be used in the process of
data preprocessing and also be used for preselecting special
object candidates.
KEYWORDS: Galactic astronomy, Principal component analysis, Data archive systems, Data mining, Mining, Databases, Data modeling, Astronomy, Spectroscopy, Stars
The Large sky Area Multi-Object fibre Spectroscopic Telescope will
yield 10 million spectra of a wide variety of objects including
QSOs, galaxies and stars. The data archive of one-dimensional
spectra, which will be released gradually during the survey, is
expected to exceed 1 terabyte in size. This archive will enable
astronomers to explore the data interactively through a friendly
user interface. Users will be able to access information related
to the original observations as well as spectral parameters
computed by means of an automated data-reduction pipeline. Data
mining tools will enable detailed clustering, characterization and
classification analyses. The LAMOST data archive will be made
publicly available in the standard data format for Virtual
Observatories and in a form that will be fully compatible with
future Grid technologies.
The Large Sky Area Multi-Object Fibre Spectroscopic Telescope
(LAMOST) will be set up and tested. A fully automated software
system for reducing and analyzing the spectra has to be developed
before the telescope finished. Requirement analysis has been made
and data model has been designed. The software design outline is
given in this paper, including data design, architectural and
component design and user interface design, as well as the
database for this system. This paper also shows an example of
algorithm, PCAZ, for redshift determination.
In order to explore the spectral energy distribution of various objects in a multidimensional parameter space, the multiwavelenghth data of quasars, BL Lacs, active galaxies, stars and normal galaxies are obtained by positional cross-identification, which are from optical(USNO A-2), X-ray(ROSAT), infrared(2MASS) bands. Different classes of X-ray emitters populate distinct regions of a multidimensional parameter space. In this paper, an automatic classification technique called Support Vector Machines(SVMs) is put forward to classify them using 7 parameters and 10 parameters. Finally the results show SVMs is an effective method to separate AGNs from stars and normal galaxies with data from optical, X-ray bands and with data from optical, X-ray, infrared bands. Furthermore, we conclude that to classify objects is influenced not only by the method, but also by the chosen wavelengths. Moreover it is evident that the more wavelengths we choose, the higher the accuracy is.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.