| |
SQL Server - Data Cleansing
on Large Text Columns
Recently I performed a data-cleansing task for a large publishing firm
that felt very much like microsurgery. The requirements were simply stated
as convert all forms of a specific type of term to uppercase in a table
containing over 30,000 rows of article data Upon gathering information
from the client, I requested a source of the terms that I could use for
locating the suspect entries and make the conversion. That's when things
got interesting and challenging. First, there was no such reference list
in existence, and second, the data existed in a variety of formats. In
summary, there was over 8,000 different terms that could exist in the set
of rows, and each could be in a variety of formats. Furthermore, the data
is contained in the SQL Sever 7.0 nText data type with the largest
row containing approximately 100,000 characters.
The process for solving this puzzle involves a two-step process. Step 1
identifies the data that will be changed and it's location. Step 2
executes the process against the data itself. Building in an extra step
allows the developer to verify the algorithm and generate interim reports
for management and quality assurance team members to verify the process
before actually affecting the data. One critical aspect that must be
followed throughout the process is what I like to call "Balance".
If the rules are very intricate, unrelated data might be adversely
affected, and the converse is true as well. Not enforcing enough rules
will invalidate the effort as well. Keep in mind that a panacea to many
data cleansing efforts is non-existent and that expectation setting with
the client and/or management is critical to a successful outcome. We
established a very fair 80/20 rule for this process.
Step 1 involved a process of splitting the contents of each row into
individual words and storing them in a table that held the word, it's
position in the original data row, and the ID of that row in the original
source as well. We broke on spaces and the close tag '>' for
HTML because the table rows also contained these tags as well. The process
was done using an INSENSITIVE CURSOR. As you can expect the table
grew in size quickly. Actually the row count was approximately 850,000.
Next, I created a set of SQL statements that would extract rows from the
secondary table allowing me to experiment with different combinations that
would yield only the data we wished to affect. The optimal statements were
used to generate a report that would show the item being converted, it's
article, and how it would appear after modification using the UPPER
function.
Step 2 also involved the use of an INSENSITIVE CURSOR to loop
through the reference table that was created from Step 1 and make all of
the updates. This was easy because my secondary table holds the value;
it's position, and source id. I had to use the UPDATETEXT command
as we were affecting nText data. Additionally, the Select Into / Bulk
Copy option was used as well to disable logging for the use of the UPDATETEXT
statements.
Lastly, I built a small ASP page with an OLEDB connection into
the database that allowed the QC staff to check the contents of the
articles against the original data. The OLEDB provider allows me to avoid
the problems encountered with large text fields and ADO result sets
that occurred using the SQL SERVER Driver. The script was run
against the live data during the overnight as not to impact the server
performance. Execution time averaged 1.5 hours.
eBallot - The Future of
Elections
Firstly, this is not an attempt for me to state my political
affiliation. This is merely an expansion of a discussion that several
colleagues and me had regarding the potential use for electronic polling
devices in future elections. This discussion surfaced as a result of the
recent events surrounding the presidential election with respect to the
clarity of the ballot format, the process of paper balloting, and the
availability of real-time results information.
The ability to record and quantify the results would be quite easy
considering the use of the Internet and the availability of inexpensive
high performance hardware. The core issues for tackling the problem
pertain to the user interface of an electronic polling booth and the
method for securely capturing and moving the data.
The interface itself would need to be a large non-glare display that
would present only several key ballot items on the display at one time.
These items would contain text and images to cater to those who are unable
to read or those who are visually impaired such as the elderly. An audio
interface with an earpiece would be available for blind voters. Ballot
selections would be made by the user tapping an entry on the screen with a
stylus that is connected to the booth, much like a pen on a string or
chain. The user would be prompted to select specific items on the screen
prior to starting to insure that they are aware of the operation of the
e-Ballot before starting. The mechanism for scrolling on the ballot would
be a physical slider bar on the right hand side of the display that would
smoothly move the display up or down depending upon the users actions.
Essentially user experience is enhanced by the use of both graphical
content and manual navigational devices.
The idea of introducing an electronic method for balloting is not new.
The ability to securely capture and record the voters selections is now
possible with the usage of encryption technology that is more effective
than that used by online e-commerce sites. The use of the stylus
introduces a type of security commonly used with Smart Card technology.
Security can be defined as something you know, who you are, and something
you have. Most uses on the web involve the first two; yet as biometrics
and other technologies mature, then the third will become more
commonplace. Smart Card technology bridges that gap by requiring the user
to possess a physical device prior to and during all system interaction.
Encryption technology elevated to 128-bit security using a series of
complex mathematical functions and algorithms can guarantee a secure
transaction. The stylus presented above would act as the Smart Card
device. This would prevent voting tampering from occurring beyond the
polling locations.
Data captured from the polling location would need to be replicated
thousands of times in distributed locations to insure redundancy and
re-correlated using a CRC-like algorithm for verification. A real-time
system such as this would probably require some enhancements to the
process of verifying identification at a polling location and the process
for submitting the ballot as well. There are implications to having such
real-time information which could alter voter behavior.
SPECIAL ISSUE
- Creating a Web
Based Management Tool for Microsoft SQL Server Jobs
This article is part of the "Special Issue" series where you
are provided with a high level of detail about a given area. In this
issue, you will learn how to develop both the front and back end
components for a web based tool that manages SQL Server jobs. The tool
called "JobWatch" allows the user to view the last
execution status of scheduled jobs and force a manual execution if
desired. The audience level for this topic is rated Intermediate, due to
the required understanding of ASP, SQL Server implementation and
administration.
"JobWatch" is a single ASP page that makes use of the sp_help_job
and sp_start_job system stored procedures for monitoring or
starting jobs respectively. The page itself contains a table filled with
the job status information and a green arrow "play button" that
allows the user to start a job. The center of the page contains a "Refresh"
button that updates the last status and execution date/time stamp column
for the jobs. You can explore using an automated refresh method with
JavaScript as a way to enhance the tool.
We will start the lesson by creating the stored procedure in Microsoft
SQL Server 7.0 that will trigger an existing job. I'm assuming that you
have an existing job that you wish to place under the control of the
"JobWatch" tool. The stored procedure should be created as
follows:
CREATE proc sp_GetData
as
- declare @returncode int
-
- exec @returncode = msdb..sp_start_job @job_name = 'dtsGetData'
-
- return(@returncode)
|
The procedure is rather
simple and makes use of the Master Database procedure for executing
jobs. We will not make any use of the Return Code for this exercise. You
should also remember the SQL account that is required to connect and run
the above stored procedure. We will assume it is user "jobwatch"
and password "cooltool". Test the stored procedure in the SQL
Query Analyzer by executing the following line and then monitoring the
output in the Job section of the SQL Server Enterprise Manager:
exec sp_GetData
The last thing you want to
do on the SQL Server side is gain an understanding of the parameters and
results for the system stored procedure sp_help_job.
Next, we will create the ASP page that glues the functionality together
and creates our useful tool. You will use the Dictionary object to
store the description and name of the jobs you wish to monitor and
control. The ASP script will provide the following functionality:
If the page is refreshed as the result of a Job Execution request (user
presses the green arrow "Play Button"), execute the stored
procedure for the job we created above. Next all of the necessary
Dictionary Objects are loaded with the Job information and equivalent
textual representation for the job status information that will be
returned from the sp_help_job stored procedure. Notice that the status
information is loaded into the Dictionary using color and font formatting.
This simply adds flavor to the display for different job outcomes. Lastly,
the stored procedure sp_help_job is executed and the resultset is used to
build a table with all of the corresponding last status and execution
information. This implementation was simplified by naming the stored
procedures and associated jobs with a naming convention that allowed the
top part of the program to determine which stored procedure to execute
based upon the job name.
Download
the ASP Listing for JobWatch
The program does make use of the adovbs.inc include file for the use of
ADO constants. You might want to explore the use of a single stored
procedure that executes any of the jobs requested by accepting a
parameter.
Database Transformation
Service (DTS) Jobs and Permissions
This article is a follow-up to one that appeared last month titled "The
Power of Microsoft SQL Server 7.0 Data Transformation Services."
The previous article described the use of Microsoft SQL Server 7.0's
Database Transformation Services (DTS) for the importing and exporting of
data from a variety of sources. Using DTS in such a manner is quite
intuitive when following the Wizard or Package Designer Interface, which
allows the user to interactively design and execute their packages. The
difficulty begins when the user wishes to run these DTS packages in an
unattended manner on a schedule with data being imported or exported to
some location on the network other than a SQL Server database.
DTS packages use the current users Windows NT and/or SQL permissions
when execution is done interactively from within the Microsoft SQL Server
Enterprise Manager. The Windows NT permissions allow the package to gather
data from a network location such as a folder and the SQL credentials
allow this data to make it's way into the designated tables. When
attempting to start the job that is associated with such a package, the
SQL Agent's set of permissions is used. If the SQL Agent does not have
permissions to go to a network folder and read it's source, the job will
surely fail. Furthermore, when using a Microsoft Access database as the
source, be sure that the SQL Agent has WRITE permissions as well as
READ permissions, because opening an MDB file requires the creation
of an LDB (Lock) file. The importance of this topic is realized when you
have a system that users interact with during normal business hours and
whose data is imported at night on a scheduled basis.
Getting the Right People
for the Job
When building a team from existing resources, or looking to hire from
outside the firm, always remember that their technical and personal skills
will have a direct effect on the deliverable. Having an individual with an
excellent technical skill-set is paramount; yet their ability to function
diplomatically, confidently, and especially enthusiastically in a team
approach is equally important.
Many companies are shifting to the team room approach commonly referred
to as "Solution Labs" or "War Rooms". These are
environments where developers work in close quarters with each other and
emphasis is placed on open communication for resolving technical issues
facing the team. The candidate must be able to articulate their own
thoughts and express them in a manner conducive to the group environment
with confidence and not be intimidated in such a setting.
Understanding the context of the candidate is helpful in assessing
their fit for the job. Work with the candidate to insure that any
apprehension on their part is not a lack of knowledge, but maybe just
nervousness. Stay on track with questioning as it pertains to the role
they must fill. Provide the opportunity for the candidate to describe a
challenge they encountered and the method they chose for resolution. Often
an energetic person will gain more ground on a project then one who is
more technically savvy, but less motivated. One good technique for
interviewing senior level resources is to provide them with a question in
an area unknown to them. This type of questioning will usually not yield
the perfect answer, but the best candidates will diplomatically present
ideas and ask questions to gain an understanding of the topic.
Does the candidate really love the type of work they've done in the
past and are applying for in your firm? Are they motivated to seek answers
for the most efficient and effective solution to a problem? A good
candidate will often express an interest in the technology beyond what he
or she does in the normal job setting. This might include reading
journals, teaching, or managing the content of a web site.
Lastly, ask yourself the following question when ready to make a
decision: "Would I hire this person to work on this project if
this were my own business"
The Power of Microsoft
SQL Server 7.0 Data Transformation Services
The ability to import and export data to and from Microsoft SQL Server
7.0 has been simplified with the use of Data Transformation Services
(DTS). This tool replaces the Bulk Copy Program (BCP) that was used in a
similar capacity in versions of the Microsoft SQL Server product prior to
the version 7.0 release. BCP required the user to manipulate a text file
that defined the order, type, size, and rules for importing data. The
utility worked well; yet was less intuitive than DTS. The user can now
define complex import and export tasks called "Packages" by
using the DTS Wizard, or the intuitive graphical designer tool. In either
case, the user can define the source, destination, and a set of rules for
moving the data. These rules can be specified using SQL queries and
VBScript. Once the Package is defined, DTS will create a JOB for the user
that can be scheduled and enabled when desired. The truly exciting part of
this utility is that it is extremely fast, supports a variety of sources
and destinations, and is easily customizable.
Recently I imported 4 large tables in under 5 seconds from Microsoft
Access into a Microsoft SQL Server database. The network at this location
was slow and the source and destination objects where in two separate
facilities in Boston. DTS can be used for some really powerful commercial
applications. It is simple to define a Package that can run on a scheduled
basis that extracts mission critical SQL Server data and loads it into a
Microsoft Excel Spread Sheet for use by management. Another practical
application involves importing large data feeds into SQL Server, which is
a common practice in the Insurance and Health Care industry.
Creating Link Tables in
Microsoft Access without using a DSN
This is a quick tip for Microsoft Access developers who want to
programmatically link to tables in a Microsoft SQL Server database. The
developer gains control over the connection to the database and tables,
and is able to shield the user from possible problems in a graceful
manner. In addition, the need to maintain a DSN on the users
machine or a shared location is eliminated. The code sample provided
assumes that the table SourceCompany has not been linked. You could
include logic to detect the existence of the LinkedCompany table in
the tabledef object, or use an on-error statement and block
to detect the failure when the table already exists in the collection.
Dim dbcurr As database
Dim tbd As TableDef
Set dbcurr = CurrentDb()
Set tbd = dbcurr.CreateTableDef("LinkedCompany")
With tbd
- .Connect = "ODBC;DRIVER=SQL Server;SERVER=server;" &
_
- "UID=username;PWD=password;DATABASE=database"
- .SourceTableName = "SourceCompany"
End With
dbcurr.TableDefs.Append tbd
The code creates a tabledef and current database object. Next
the desired link table name is specified in the CreateTableDef
statement. This is a placeholder at this point because we have not
established the link yet. The With clause defines our ODBC
connection and the name of the external table we wish to link. Lastly,
the new tabledef is appended to the tabledef collection of
the current database. The link will persist when the Microsoft Access
database is closed. Remember to press the F5 (Refresh) key in the
Tables window in Microsoft Access for the new link table
to appear.
SPECIAL ISSUE - Effectively
Handling Control-Breaks
The term Control-Break normally refers to the handling of data that
comes back in a format where the developer must implement logic to
determine when to start a new section heading or produce sub totals.
Usually this is handled in a loop where values are stored in variables
called "Current" and "Previous" and they
are continually checked. When an inequality is detected, the program
performs another action, sets the variables, and then continues execution.
The code for these situations can become a bit confusing especially if the
hierarchy (Control values) are several levels deep. The following diagram
represents the hierarchy that we want to manipulate: (Note: The
colors have been included to clarify the explanation.)
Parent Level Category
Sub Level Category
- Child
- Child
- Child
- Child
Sub Level Category
- Child
- Child
- Child
Parent Level Category
Sub Level Category
- Child
Sub Level Category
- Child
- ...
- ...
|
Note that the number of Child rows under each Sub Level
Category varies and that in this case, any number of Sub Level
Categories and Parent Level Categories can exist as well.
A solution to this scenario involves an organized approach to returning
the data to the program with embedded codes that instruct the program how
to handle each row. The program logic is simplified and the probability of
an error is minimized. The following represents the data that would be
returned from a SQL statement using the scenario from above:
SELECT d.ParentCategoryLevel,
- d.SubCategoryLevel,
- d.Child
FROM tblData d
ORDER BY d.parentCategoryLevel, d.SubCategoryLevel, d.Child
| |
Parent Level Category |
Sub Level Category |
Child |
| |
Parent Level Category |
Sub Level Category |
Child |
| |
Parent Level Category |
Sub Level Category |
Child |
| |
Parent Level Category |
Sub Level Category |
Child |
| |
Parent Level Category |
Sub Level Category |
Child |
| |
Parent Level Category |
Sub Level Category |
Child |
| |
Parent Level Category |
Sub Level Category |
Child |
| |
Parent Level Category |
Sub Level Category |
Child |
| |
Parent Level Category |
Sub Level Category |
Child |
The data is returned in
this manner for performance reasons. Executing sub-queries (stored
procedures) at each level would be inefficient. Now imagine the data is
returned with the proper integer codes:
| |
Code 1 |
Parent Level Cat. |
NULL |
NULL |
| |
Code 2 |
NULL |
Sub Level Cat. |
NULL |
| |
Code 3 |
NULL |
NULL |
Child |
| |
Code 3 |
NULL |
NULL |
Child |
| |
Code 3 |
NULL |
NULL |
Child |
| |
Code 3 |
NULL |
NULL |
Child |
| |
Code 2 |
NULL |
Sub Level Cat. |
NULL |
| |
Code 3 |
NULL |
NULL |
Child |
| |
Code 3 |
NULL |
NULL |
Child |
| |
Code 3 |
NULL |
NULL |
Child |
| |
Code 4 |
NULL |
NULL |
NULL |
| |
Code 1 |
Parent Level Cat. |
NULL |
NULL |
| |
Code 2 |
NULL |
Sub Level Cat. |
NULL |
| |
Code 3 |
NULL |
NULL |
Child |
| |
Code 2 |
NULL |
Sub Level Cat. |
NULL |
| |
Code 3 |
NULL |
NULL |
Child |
| |
Code 4 |
NULL |
NULL |
Child |
Notice that the Code 4
is used to indicate the end of a Parent Level Category section.
This terminating code is used by the program to indicate when an
operation should be executed, such as closing a file or starting a new
heading. The pseudo-code for this scenario would appear as follows ("rs"
is the result set containing the returned data):
DO WHILE NOT rs.eof
SELECT CASE rs("Code")
CASE 1
This is a Parent Level Category. Open a file and make the text
large and bold.
CASE 2
This is a Sub Level Category. Indent the text and make it bold.
CASE 3
This is a Child row. Indent and use standard text.
CASE 4
This is a terminating row. Close the file.
END SELECT
rs.MoveNext
LOOP
The SQL (Structured
Query Language) statements are the real key to this solution. By using
the UNION and ORDER BY statements, the data can be pieced
together in the format we desire. The following SQL statements achieve the
objective:
SELECT ParCL = d.ParentCategoryLevel,
- SubCL = NULL,
- Child = NULL,
- Code = 1
FROM tblData d
UNION
SELECT ParCL = d.ParentCategoryLevel,
- SubCL = d.SubCategorylevel,
- Child = NULL,
- Code = 2
FROM tblData d
UNION
SELECT ParCL = d.ParentCategoryLevel,
- SubCL = d.SubCategoryLevel,
- Child = d.ChildDataItem,
- Code = 3
FROM tblData d
UNION
SELECT ParCL = d.ParentCategorylevel,
- SubCL = NULL,
- Child = NULL,
- Code = 4
FROM tblData d
ORDER BY ParCL, Code, SubCL, Child
Using the ORDER BY
statement, we can take all of the rows and order them in the desired
format. This solution will return more rows which will impact performance
depending upon the size and number of data rows being scanned. I used this
solution to early-render HTML pages where the concern for
performance was important; yet the extra few seconds was acceptable. The
"payback" with this approach comes in the form of cleaner and
more manageable code. The strategic use of indexes and other tuning
enhancements can help performance as well.
Web-based Surveys – An
effective and efficient method for field data collection.
Over the years I've worked on projects that involved some form of data
collection and reporting. These efforts were usually industry
specific and involved collecting information from various companies on
compliance with government regulations and adherence to recommended
practices. After the data was collected from the field, reports were
generated in a variety of summary formats. The reporting process was
usually a set of Microsoft Access queries and reports with embedded
formulas. Data collection is the most difficult process to achieve
accurately and in a timely manner due to the dependency on your target
respondents.
Collecting data from the field usually involves several steps.
Identifying the data-points required from the field, method for delivering
and receiving the responses, and some form of follow-up with the
participants. Identifying required information and prodding the
participants to complete the surveys will not get any easier; yet the
method of delivery improves when utilizing a web-based approach. Consider
the following scenario:
I'm a regulatory agency tasked to oversee 100 national companies who
engage in the petro-chemical industry. Each is required to document their
compliance with various government regulations regarding pollution
abatement and waste water management. The information on these companies
is stored in either a Microsoft Access or SQL Server database. I have a
set of HTML/ASP pages created that contain all of the required survey
questions I require along with clear textual help and a 1-800 number to
assist the respondents. The start of the survey welcomes the user and
provides a friendly summary of the survey process. The last page dumps all
of the form variables into the Microsoft Access or SQL Server database.
Once I'm content with my survey, I then need to prepare my notification
that will be sent to the participants. I go through each row of the
Company table and generate an email via CDONTS or other third-party mail
object; yet I don't just include a simple link in the email. I want to
personalize the interaction, so I append their Company ID to the end of
the link. When they click on the link in their email, they are brought to
the survey and a nice welcome message is presented including their own
name. Of course, each page in the survey might validate the users
responses (then ASP's are used - not html) before they proceed to the next
page. Once the survey is completed, their responses are stored in the data
repository. At scheduled intervals it is possible to query the database
and identify anyone who hasn't completed the survey, and send them a
friendly reminder. One additional aspect to consider when using this
web-based approach. Some companies use older browsers which might not have
features enabled such as JavaScript and/or cookies. This could affect the
validation logic used on forms and should be factored into the design.
In conclusion, once the data is collected, it is possible to use
Microsoft Access or Excel to produce reports. All of these tools can be
linked together with little effort.
Keeping your Database
behind the Firewall - The Portability of XML
This is a further
exploration on the topic of early-rendering web pages to enhance site
performance (see "Early
Rendering - Portability and Performance"). In this scenario, the
database is kept behind the firewall, and all content for the site is
published according to schedule or when conditions warrant (i.e. the MSN
site). This publish process has been explained in previous articles of the
IT News; yet what about capturing the users data? All static
(early-rendered) pages are HTML pages; yet they've been generated with
HIDDEN form fields containing all or some data displayed on the page
itself. The user can select a link on the form to modify the data. This
link sends the user to an ASP page that uses the Request("form
variable") syntax to retrieve the data from the HTML page. The user
can modify the data and then press submit when complete. The data does not
get stored in the database; yet is loaded into an XML file using the
Microsoft Document Object Model (DOM). The file is then transferred via
CDONTS or other third-party mail object to a location inside the firewall.
This file can be parsed again using the DOM and presented to an
administrator for approval or automatically imported into the data
repository. The modifications will be visible to the user community when
the next publish cycle occurs. An email can be sent to the user as well to
notify them of the posting of their information. This is an excellent
strategy for a company-directory/product type of site with high volume.
User Interface
Ergonomics - Focusing on the User
I've always had a strong
interest in the usability of systems. Not just computer-based systems; but
any process that accomplishes some task. Years ago, I had an excellent
opportunity to visit Toyota's Kamigo Engine plant in Nagoya, Japan. The
level of automation in the facility was extremely high with all types of
numerically controlled machines placed strategically along the assembly
line. Despite this operations management miracle, their was a definite
concern for the human element. Any type of difficulty faced by the worker
needed to be rectified and resolved as not to affect the finely tuned
cycle times. Consideration for the users physical and emotional well-being
was continually addressed through a feed-back and control process
involving workers and a group leader. Many IT professionals spend a bulk
of their time concerned with rapid delivery of information to the user's
desktop; however, often neglect the importance of usability. Users who are
required to fight their way through multiple forms to complete a
transaction, or assist a customer, will become distressed. This can lead
to poor productivity, lower levels of customer service, and a loss of
revenue. It's essential to step back and look at a process. "What
is the intent of the process and how will the users expect it to
function?" Get prototypes in front of the users and listen to
their feedback on all details including items such as colors and fonts.
Many small points, if not addressed, can become large headaches later on.
These considerations apply to both web-based and traditional multi-tier
solutions.
Storing HTML in the
Database - The Advantages and Disadvantages
This article is similar to
one that appeared in the March 2000 issue titled "Early
Rendering - Portability and Performance". Building upon the
strategy of early rendering for speed and platform independence, the
publishing duration must also be considered. A database that receives
frequent updates might require all HTML pages to be pushed out to the site
several times each day. Certain layout and detail information on each of
these HTML pages can be pre-processed in the database and stored in a
single varchar field as a string containing both HTML tags and
column data for the record. This will dramatically improve performance
when looping through the ADO Result Set in VBScript and writing out each
detail page. Building the string in VBScript, handling each ADO column
independently, will degrade performance. Using a single ADO column with
the HTML, along with includes for the Header and Footer will super-charge
your solution. I've recently reduced the rendering time of 1600 pages from
45 minutes to 22 seconds using similar methods. These 1600 pages included
all sorts of Index, Detail, and Category lookup pages as well.
The improved performance does come at a price. Storing HTML in the
database is difficult to maintain. Getting to the HTML can be
inconvenient, and even less friendly to edit in a tool that doesn't
illuminate HTML and ASP tags in an intuitive manner. These issues can be
minimized by placing a minimal amount of formatting information in the varchar
field and limiting yourself to just the concatenation of the column data.
Other elements such as headers and footers, which are not dynamic, should
be handled as include files.
The Power of Microsoft
Access - Reducing Risk
Recently I was asked to
develop a simplified administration tool for a Microsoft SQL Server
database. The limited number of in-house users would want to have secured
access to the database and make routine modifications to it's content
throughout the business day. To accommodate this request we opted for
Microsoft Access as the tool of choice due to it's built in functionality
for handling the users interaction with each record and tabular/datasheet
display characteristics. Creating a simple Access Form and creating Link
tables to the SQL database was a breeze and completed in just several
hours. Taking actions like this, especially with proven tools and
techniques, will minimize your risk and cost on any IT effort.
The Future of Wireless
Web - A Prediction
As wireless web becomes
more popular and affordable, we will see a gradual migration towards
computing devices that no longer require direct connectivity via wires and
hubs. Users will interact with the web and corporate environments in
virtual domains. A corporation will contain a virtual domain for it's
users to access secured servers, email, and printers within there
facility, yet another virtual domain encompassing their business district
might also exist providing traffic and local event information. Imagine
the marketing potential for advertisers placing highly targeted content on
your laptop as you drive through a virtual domain (i.e. Times Square, New
York). Users could choose to subscribe to these virtual domains as they
drive through a foreign area seeking lodging and dining information.
Earning
Millions without a Computer - A True Business Case Study
A Boston area dental
practice earns millions in revenue each year without relying on any IT
infrastructure. The practice contains 7 patient treatment rooms, 15
employees, at least 5000 annual bookings, and specializes in general,
pediatric (child), and endodontic services. The office utilizes modern
x-ray machines in every room, constant interaction with an outside lab for
various dental appliances such as dentures, and participates in many major
insurance provider programs. The practice has been serving the community
for over 50 years, and has really focused on high-quality personalized
care, which leads to numerous repeat customers seeking a variety of
treatment plans from oral hygiene to emergency surgery.
Essentially the practice generates a level of customer demand that they
can efficiently match with quality service. The added cost of implementing
an automated office management and marketing system, as well as the
increased customer demand that such a system might generate, might
actually harm the balance they've established. The dental office is not an
IT shop. The goal of the business is to generate revenue, provide some
form of satisfaction to the dental practitioners who enjoy practicing
their skills, and to serve the community with excellent care. They are
doing this quite well and agree that deferring an IT implementation is
probably the best decision at this time. This conclusion is often
overlooked and leads to more than just toothaches after time and money has
been spent.
Populate JavaScript
DHTML with Database Content
Recently I was tasked with
the requirement to place a collapsible tree on the main page of a very
prominent web site. I knew this involved dynamic HTML and found a
JavaScript sample and code that would be an excellent fit. Later my client
informed me that the content of the tree would change quite frequently and
that they'd really like a data-driven solution for this form element. This
presented an interesting problem because the tree was all JavaScript
(client-side) and any database calls were reserved for VBScript
(server-side). After careful consideration I arrived at a rather simple;
yet flexible solution. I'd build a string of JavaScript code while looping
through the ADO Result Set in my VBScript, and set a hidden form element
to this value. Upon the execution of the "OnLoad" function in
the <BODY> tag, a call would be made to a JavaScript Init() function
that would evaluate the value of the hidden form element containing the
string of JavaScript commands. There were some slight differences in the
order of interpretation between Microsoft's Internet Explorer and Netscape's
Navigator browsers. The server-side building of the string was built
before the <BODY> tag to satisfy Netscape's Navigator
browser. This was indeed a fun challenge.
Data Cleansing Tips
& Tricks
Data cleansing efforts can
be resource intensive and should contain a checkpoint at some specified
point where a "go" or "no go" decision is considered.
I've had the opportunity to execute both a manual and automated process at
several clients. An automated method usually entails the generation of
Match Codes containing standardized names and portions of the address and
postal code. These codes are then grouped, compared, and the associated
records are collapsed into a specific parent based upon equality and the
business unit's rules. A manual process involves a human operator going
through the data and modifying, deleting, or merging records, also based
upon the business unit's rules. Both approaches usually entail some form
of standardization and correction of data such as names containing words
like Corporation, Corp., Inc., Incorporated, and Company. States, regions,
and countries are another area frequently requiring correction. Some
processes involve inspecting area and country codes to resolve missing
states and/or regions. Other creative and accurate methods can be
employed to correct a variety of missing values.
I'd like to explain an interesting approach to the manual method
described above. Microsoft SQL 7.0 Enterprise Manager now contains
a grid-like display of a table much like Microsoft Access. Before
the process begins, I insert two columns into a cleansing candidate table.
The first column contains a sequence number starting at 1 and going to the
record count. This value is assigned once the records are sorted by a
common column such as Company Name or Product Number. The user will look
for duplicates and usually the duplicates will fall together after a sort
is performed. The second column is called a status where the user can
enter codes. The codes I use are "D" for delete, "E"
for export, "MC,xxx" for merge child with parent at id xxx where
xxx is the id from my first column defined above. This allows the operator
to sort on any column; yet preserve their codes and dependencies. After
the operator is finished, I execute SQL scripts that look at the codes and
execute accordingly. Each code is preceded with a character such as 'x' to
identify it as being completed and to indicate to the script that it
should be ignored on subsequent executions. Keeping the rows until the end
allows me to re-visit and correct any anomalies if they arise. Additional
codes can be defined and built into the SQL scripts depending upon the
requirements of the business unit's rules and requirements. Prior to
delivery, I remove the additional columns.
Microsoft Remote
Scripting - An ASP Miracle
Remote Scripting technology
allows the developer to deliver ASP web solutions that behave similar to
their traditional client/server counterparts. A great use of this
technology involves a dependant listbox scenario. The user selects a
country from one dropdown, and the regions or states within that country
appear in another dropdown without refreshing the page. This
technology incorporates a server-proxy architecture which is only required
on the server and is freely available. Remote Scripting currently is
supported by both Internet Explorer and Netscape Navigator;
yet the developer will encounter differences in the method of invocation.
Early Rendering -
Portability and Performance
The reliance on database
technology to support the dynamic content of web solutions can lead to a
degradation in performance as the number of hits to the site increases.
One leading publishing company in New England employs a technique called
Early Rendering to meet this hurdle head-on. Developers create ASP
components that generate HTML pages during the publishing phase using
Microsoft's Interpress product. All database calls are completed
during the publishing step, which generates all the required pages.
Surely this can lead to an increased number of pages for e-commerce
solutions; yet the performance improvements are remarkable when
considering users lose interest after just a few seconds of waiting.
Using this technique to generate only HTML output can lead to platform
independent solutions. Currently the New England company deploys on
the NT platform.
Traditional
Client/Server Solutions - A Deployment Nightmare
Creating applications in
tools such as Visual Basic traditionally required a business entity to
rely on the services of their internal IT group to deploy these
applications and any subsequent releases to their user community.
Imagine the logistical nightmare involved when a portion of the staff
consists of field sales representatives that rarely see the corporate home
office. Web based solutions can be deployed on a secure Intranet and
made instantly available to all users without the installation hassle.
Remote users can connect to these same systems using SSL or other PK
encryption technology. Another benefit involves the notion of a
thin-client architecture. End-user computers would only require an
Internet browser and the corporate standard issue of productivity software
such as word processors and spread sheet applications.
Back
to Top
|